Skip to content

OTA-1965: WIP: pkg/cvo: Add Lightspeed proposal integration#1379

Draft
harche wants to merge 1 commit intoopenshift:mainfrom
harche:lightspeed-proposals
Draft

OTA-1965: WIP: pkg/cvo: Add Lightspeed proposal integration#1379
harche wants to merge 1 commit intoopenshift:mainfrom
harche:lightspeed-proposals

Conversation

@harche
Copy link
Copy Markdown
Contributor

@harche harche commented Apr 21, 2026

Summary

  • Adds LightspeedProposals feature gate integration — when enabled, CVO creates Proposal CRs (agentic.openshift.io/v1alpha1) after discovering available updates
  • New pkg/readiness/ package runs 9 parallel cluster readiness checks (operator health, etcd, nodes, PDBs, API deprecations, CRD compat, network, OLM lifecycle, cluster conditions) and embeds the results in the proposal
  • New pkg/proposal/ package handles proposal creation with deterministic naming and AlreadyExists-based dedup
  • Feature-gated install manifests for Agent, Workflow, and system prompt ConfigMap (annotated with release.openshift.io/feature-gate: LightspeedProposals)
  • Gated behind CustomNoUpgrade feature set — no impact on default clusters

Note: openshift/api dependency uses a fork (harche/api) with the LightspeedProposals feature gate definition until the upstream PR lands.

Test plan

  • Unit tests for pkg/proposal/ (SelectTarget, dedup, buildRequest, classifyUpdate, sanitize, readSystemPrompt)
  • Unit tests for pkg/readiness/ (all 9 checks, RunAll orchestration, JSON marshaling, client helpers)
  • Unit tests for pkg/featuregates/ (gate parsing, feature change stopper)
  • Integration test in test/cvo/readiness.go (validates checks against real cluster with ground-truth comparisons)
  • E2E verified on 4.21.5 cluster: readiness checks 9/9 ok, proposal created, dedup works
  • CI validation

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Lightspeed Proposals: operator can generate Proposal resources for discovered updates (includes proposal/agent/workflow CRDs, agent/skill configs, and an upgrade-advisor prompt).
    • Rich readiness reports: concurrent checks added for cluster conditions, node capacity, operator health, etcd, network/proxy/TLS, CRD compatibility, OLM lifecycle, PDB drain, and API deprecations.
  • Documentation

    • Developer guide for running Lightspeed proposals and operational gotchas.
  • Tests

    • Extensive unit and integration tests for readiness checks and proposal logic.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harche
Once this PR has been reviewed and has the lgtm label, please assign fao89 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Walkthrough

CVO now conditionally creates Lightspeed Proposal CRs when updates are found (gated by LightspeedProposals). A concurrent readiness subsystem (9 checks) produces structured JSON used in proposals. New proposal/creator package, readiness framework, checks, CRDs, Agent/Workflow manifests, tests, and docs were added.

Changes

Cohort / File(s) Summary
Feature gate & CVO integration
pkg/featuregates/featuregates.go, pkg/cvo/cvo.go, pkg/cvo/availableupdates.go, pkg/cvo/status_test.go
Adds LightspeedProposals() gate, CVO gains proposalCreator, gates proposal creation (non-HyperShift), and invokes best-effort proposal path from available-updates sync; small test helper added.
Proposal package & tests
pkg/proposal/proposal.go, pkg/proposal/proposal_test.go
New package to select target versions, classify updates, build deterministic proposal names and request text (optional system prompt embedding), and create Proposal CRs via dynamic client; comprehensive unit tests with fake dynamic client.
Readiness framework core
pkg/readiness/check.go, pkg/readiness/client.go, pkg/readiness/client_test.go, pkg/readiness/check_test.go
New Check interface, CheckResult/Output/Meta types, RunAll orchestration with per-check timeouts, GVRs and dynamic-client helpers, nested accessors, condition parsing, version compare, and unit tests.
Readiness checks implementations
pkg/readiness/cluster_conditions.go, pkg/readiness/node_capacity.go, pkg/readiness/pdb_drain.go, pkg/readiness/crd_compat.go, pkg/readiness/etcd_health.go, pkg/readiness/network.go, pkg/readiness/olm_lifecycle.go, pkg/readiness/operator_health.go, pkg/readiness/api_deprecations.go
Nine readiness checks added that query cluster resources (ClusterVersion, Nodes, PDBs, CRDs, etcd pods/operator, Network/Proxy/APIServer, OLM lifecycle, operator health, API deprecations) and return structured results/summaries; per-check error handling and section errors implemented.
Readiness tests & integration
pkg/readiness/checks_test.go, pkg/readiness/olm_lifecycle_test.go, test/cvo/readiness.go
Extensive unit tests for checks using fake dynamic client and unstructured objects; Ginkgo integration test exercising RunAll end-to-end against cluster-like clients.
Agent/Workflow/Prompt manifests & CRDs
install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml, install/0000_00_cluster-version-operator_51_lightspeed-agents.yaml, install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml, install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml, install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml, install/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yaml
Adds ConfigMap prompt, Agent and Workflow resources, and CRDs for Proposal/Agent/Workflow (OpenAPI schemas, printer columns, status subresource and validations) to support Lightspeed agent workflows and proposals.
Docs & Go mod
AGENTS.md, go.mod
Updates AGENTS.md with Lightspeed Proposal integration, dev workflow, and gotchas; adds a replace directive in go.mod.

Sequence Diagram

sequenceDiagram
    participant CVO as Cluster Version Operator
    participant Readiness as Readiness System
    participant Creator as Proposal Creator
    participant K8s as Kubernetes API

    CVO->>K8s: Discover available updates
    CVO->>CVO: Check LightspeedProposals gate & non-HyperShift
    alt gate enabled & not HyperShift
        CVO->>Readiness: RunAll(current, target)
        par Concurrent readiness checks
            Readiness->>K8s: List/Get cluster resources (Nodes, PDBs, CRDs, CSVs, Subscriptions, ClusterOperator, ClusterVersion, Network, APIRequestCount, etc.)
        end
        Readiness-->>CVO: Aggregated readiness JSON
        CVO->>Creator: MaybeCreateProposal(current, target, readiness JSON)
        Creator->>K8s: Create Proposal CR (`proposals.agentic.openshift.io`)
        K8s-->>Creator: Created / AlreadyExists / Error
        Creator-->>CVO: Creation result
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Microshift Test Compatibility ⚠️ Warning Test suite uses multiple MicroShift-incompatible OpenShift APIs (ClusterVersion, ClusterOperator, Network, etcd pods, MachineConfigPool, OLM resources) with no protective mechanisms. Add [apigroup:config.openshift.io] tag to Describe block or guard with exutil.IsMicroShiftCluster() check to skip on MicroShift.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main addition: Lightspeed proposal integration in the Cluster Version Operator package. It is clear, specific, and accurately reflects the primary change across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in the pull request are static, descriptive, and deterministic with no dynamic values like pods, timestamps, or UUIDs.
Test Structure And Quality ✅ Passed Ginkgo integration test properly separates concerns into nine distinct blocks, each testing single behavior with meaningful failure messages. Unit tests follow standard Go patterns. Test suite demonstrates good structure consistent with repository.
Single Node Openshift (Sno) Test Compatibility ✅ Passed New readiness.go test validates cluster readiness checks that are topology-agnostic and work on both multi-node and SNO clusters without making assumptions about node count, MCP structure, or requiring multiple pods for functionality.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces only custom resource definitions and readiness checks without adding pod scheduling constraints, topology-dependent affinity rules, or workload deployments.
Ote Binary Stdout Contract ✅ Passed PR adds library code using klog in runtime operations, not process-level code. CVO already uses klog throughout; main must be pre-configured for stderr. Test code is isolated.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The test file contains no IPv4 hardcoded addresses, IPv4-specific parsing, or external connectivity requirements. It exclusively uses cluster-internal Kubernetes APIs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🧹 Nitpick comments (6)
pkg/readiness/check_test.go (1)

150-224: This test doesn't actually exercise RunAll.

The body manually iterates checks and recomputes CheckResult values — it's a copy of the production logic, so if RunAll's orchestration (e.g. per-check timeout, concurrency, elapsed accounting, partial-data handling at lines 174-178) regresses, this test won't catch it. The origAllChecks := AllChecks + _ = origAllChecks dance on Lines 155-160 makes this explicit: the test acknowledges it can't substitute checks and then proceeds anyway.

Consider one of:

  • Refactor pkg/readiness/check.go so RunAll accepts []Check (or AllChecks is a package-level var that tests can override), then call the real RunAll with [okCheck, failCheck, partialCheck] and assert on the returned *Output (including Meta.ChecksOK/ChecksErrored and per-check Data/Error preservation, which is the behavior this file claims to cover).
  • Or delete this test and rely on TestAllChecksReturnsExpectedCount + checks_test.go for coverage.

As per coding guidelines (**/*_test.go): "Don't introduce single-use variables just to name an intermediate value" — origAllChecks is a flag for the dead-code nature of the test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/check_test.go` around lines 150 - 224, The test
TestRunAllMixedResults is not exercising the real RunAll orchestration (it
re-implements the logic and only toys with AllChecks), so either make RunAll
testable by accepting an injected []Check or make AllChecks a package-level
variable tests can override and then call RunAll with []Check{okCheck,
failCheck, partialCheck} and assert on the returned *Output (including
Meta.ChecksOK/Meta.ChecksErrored and per-check Data/Error preservation), or
remove the dead test; update the test to call RunAll (or replace it) instead of
manually iterating checks so timeouts, concurrency, elapsed accounting and
partial-data handling in RunAll are actually verified.
pkg/cvo/cvo.go (1)

400-406: Consider storing the dynamic client on Operator rather than routing it through proposalCreator.DynamicClient().

Line 1238 retrieves the dynamic client via optr.proposalCreator.DynamicClient() purely so readiness.RunAll can use it. That forces proposal.Creator to leak its internal client as a public accessor for an unrelated caller. Initializing and keeping dynamicClient on Operator (and passing it into NewCreator) keeps the coupling one-way.

 	if optr.shouldCreateLightspeedProposals() {
-		if dynamicClient, err := dynamic.NewForConfig(restConfig); err != nil {
+		dynamicClient, err := dynamic.NewForConfig(restConfig)
+		if err != nil {
 			klog.Warningf("Failed to create dynamic client for LightspeedProposal: %v", err)
 		} else {
+			optr.dynamicClient = dynamicClient
 			optr.proposalCreator = proposal.NewCreator(dynamicClient, proposal.DefaultConfig())
 		}
 	}

Then maybeCreateLightspeedProposal uses optr.dynamicClient directly and proposal.Creator no longer needs a DynamicClient() getter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cvo/cvo.go` around lines 400 - 406, Store the dynamic client on the
Operator (optr) instead of relying on proposal.Creator.DynamicClient(): when
creating the proposal creator in the init block where
proposal.NewCreator(dynamicClient, ...) is called, assign dynamicClient to a new
optr.dynamicClient field and pass that same client into proposal.NewCreator;
update maybeCreateLightspeedProposal and readiness.RunAll callers to use
optr.dynamicClient directly and remove the DynamicClient() accessor from
proposal.Creator so the proposal package no longer exposes its internal client.
test/cvo/readiness.go (2)

20-74: Add Ginkgo Labels and run RunAll once per suite, not per It.

Two concerns:

  1. Missing Labels. As per coding guidelines (test/**/*.go): "Use Ginkgo Labels to mark test categories (e.g., TechPreview, serial)". Since Lightspeed proposals are gated behind CustomNoUpgrade/LightspeedProposals, this suite should carry at least a TechPreview label (and likely Serial — it reads cluster-wide state that other parallel tests can perturb). Without a label, the test will run on every CI job including default-featureset runs where the prerequisites aren't guaranteed.

  2. readiness.RunAll(...) is invoked in every It block (lines 57, 65, 92, 121, 150, 165, 186, 203, 213) — that's 9 live-cluster executions × 9 checks each = 81 round-trips against the apiserver per suite run, with non-negligible latency. Worse, the "ground truth" List calls happen at a different time than the readiness run, so node/pod counts can legitimately drift between the two snapshots and cause flakes.

Cache a single *readiness.Output in a shared variable (either in BeforeEach, or once in BeforeAll if the suite tolerates ordering), and have each It only consume it.

♻️ Proposed shape
-var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, func() {
+var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`,
+	g.Label("TechPreview", "Serial"), func() {
 	var (
 		dynamicClient  dynamic.Interface
 		kubeClient     kubernetes.Interface
 		configClient   *configv1client.ConfigV1Client
 		ctx            = context.TODO()
 		currentVersion string
 		targetVersion  string
+		output         *readiness.Output
 	)
 
 	g.BeforeEach(func() {
 		...
+		output = readiness.RunAll(ctx, dynamicClient, currentVersion, targetVersion)
 	})

Then drop the repeated readiness.RunAll(...) calls inside each It and reference output directly. Also, per guidelines, remember to run make update so .openshift-tests-extension metadata picks up the new test names/labels.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cvo/readiness.go` around lines 20 - 74, Add Ginkgo labels to the suite
(e.g., "TechPreview" and "Serial" or the appropriate
"CustomNoUpgrade"/"LightspeedProposals" labels) and stop calling
readiness.RunAll(...) inside every It; instead, run readiness.RunAll(ctx,
dynamicClient, currentVersion, targetVersion) once and store its result in a
suite-scoped variable (e.g., var output *readiness.Output) populated in
BeforeEach or a BeforeAll so each It reads from that cached output; update all
It blocks to reference the cached output and remove the repeated RunAll calls,
then run make update to refresh test metadata.

192-194: Unsafe type assertion on JSON-adjacent data.

result.Data["blocking_pdbs"].([]map[string]any) will panic (not fail gracefully) if the check returned nil, returned a different slice shape, or errored. On a healthy cluster with zero PDBs, the check may return an empty []map[string]any or an untyped nil; either way, a panicked test provides less signal than a failed assertion.

-		blockingPDBs := result.Data["blocking_pdbs"].([]map[string]any)
-		o.Expect(len(blockingPDBs)).To(o.Equal(expectedBlocking),
-			"blocking PDB count should match actual blocking PDBs")
+		blockingPDBs, ok := result.Data["blocking_pdbs"].([]map[string]any)
+		o.Expect(ok).To(o.BeTrue(), "blocking_pdbs should be []map[string]any, got %T", result.Data["blocking_pdbs"])
+		o.Expect(len(blockingPDBs)).To(o.Equal(expectedBlocking),
+			"blocking PDB count should match actual blocking PDBs")

Same pattern applies to result.Data["summary"].(map[string]any) on Line 125.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cvo/readiness.go` around lines 192 - 194, The code does an unsafe type
assertion on result.Data["blocking_pdbs"] (and similarly result.Data["summary"])
which can panic; change to a safe assertion using the comma-ok pattern (e.g., v,
ok := result.Data["blocking_pdbs"]; cast, ok2 := v.([]map[string]any)) or a type
switch, assert with o.Expect that the value exists/is the expected type, and if
nil treat it as an empty slice/map before using len or indexing; update the
references to blockingPDBs and the summary extraction to use these safe checks
so tests fail with assertions instead of panicking.
pkg/readiness/client_test.go (1)

102-126: Empty-map subtest asserts nothing; simplify substring search.

With contains: []string{} the inner loop never runs, so the "empty" case trivially passes regardless of what FormatLabelSelector returns. Either assert the expected output explicitly (e.g. got == ""), or drop the case. Also, the manual index walk at Lines 114-119 is just strings.Contains(got, s).

♻️ Proposed simplification
-		{
-			name:     "empty",
-			labels:   map[string]string{},
-			contains: []string{},
-		},
+		{
+			name:   "empty",
+			labels: map[string]string{},
+			want:   "",
+		},
...
-			got := FormatLabelSelector(tt.labels)
-			for _, s := range tt.contains {
-				found := false
-				for i := 0; i <= len(got)-len(s); i++ {
-					if got[i:i+len(s)] == s {
-						found = true
-						break
-					}
-				}
-				if !found {
-					t.Errorf("FormatLabelSelector(%v) = %q, want to contain %q", tt.labels, got, s)
-				}
-			}
+			got := FormatLabelSelector(tt.labels)
+			for _, s := range tt.contains {
+				if !strings.Contains(got, s) {
+					t.Errorf("FormatLabelSelector(%v) = %q, want to contain %q", tt.labels, got, s)
+				}
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/client_test.go` around lines 102 - 126, The "empty" subtest
currently asserts nothing because contains: []string{} makes the inner loop
skip; update the test to use strings.Contains(got, s) instead of the manual
index walk and for the "empty" case assert the exact expected output (e.g. for
the test case with name "empty" and labels map[string]string{} add an explicit
check like if got != "" { t.Errorf(...)}), or drop that case—locate the
table-driven tests around FormatLabelSelector in pkg/readiness/client_test.go
and modify the test loop to call strings.Contains and add the explicit equality
assertion for the "empty" entry.
pkg/readiness/api_deprecations.go (1)

21-30: Use typed Kubernetes error checks for missing resources.

strings.Contains(err.Error(), "not found") is fragile and can miss error text variations or localization. Prefer apierrors.IsNotFound(err) to properly check for resource-not-found errors returned by the Kubernetes client.

♻️ Proposed refactor
 import (
 	"context"
 	"fmt"
-	"strings"
 
+	apierrors "k8s.io/apimachinery/pkg/api/errors"
 	"k8s.io/client-go/dynamic"
 )
@@
 	arcs, err := ListResources(ctx, dc, GVRAPIRequestCount, "")
 	if err != nil {
 		// APIRequestCount may not be available on all clusters
-		if strings.Contains(err.Error(), "not found") {
+		if apierrors.IsNotFound(err) {
 			result["warning"] = "APIRequestCount resource not available"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/api_deprecations.go` around lines 21 - 30, Replace the fragile
string check in the error handling block inside the function that lists
APIRequestCounts (the code checking if strings.Contains(err.Error(), "not
found")) with the typed Kubernetes error check apierrors.IsNotFound(err); import
"k8s.io/apimachinery/pkg/api/errors" (alias apierrors if not already) and keep
the existing behavior of returning the warning map and nil error when the
resource is absent, otherwise return the wrapped fmt.Errorf as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 213-214: Replace the raw cross-build command for the
cluster-version-operator with the supported repository build target: remove the
CGO_ENABLED/GOOS/GOARCH go build line and instead instruct users to run the
repository's make build target (e.g., "make build") which produces the CVO
binary into the _output/ directory; if a raw cross-build must be kept, add an
explicit justification explaining why the dev flow cannot use "make build" and
document expected outputs and environment differences.
- Around line 237-244: Replace the hard-coded personal image reference
quay.io/harpatil/cvo-lightspeed:${TAG} with a configurable IMAGE
variable/placeholder used consistently across the build, push and oc patch
commands (the docker build, docker push lines and the value field in the oc
patch for deployment cluster-version-operator). Update the docs to reference
IMAGE or ${IMAGE} (and document that contributors should set IMAGE to their own
registry/tag) so the build/push and the patch operation use IMAGE instead of the
personal quay.io path.

In `@go.mod`:
- Line 112: The go.mod replace that points github.com/openshift/api to your
personal fork must be removed; edit go.mod to delete the line "replace
github.com/openshift/api => github.com/harche/api ..." and restore use of the
upstream module, then ensure the code referencing FeatureGateLightspeedProposals
in pkg/featuregates/featuregates.go either imports the upstream
github.com/openshift/api package (if the constant exists upstream) or you
instead vendor/define the specific constant locally or upstream the symbol to
the official repo before merging.

In `@install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml`:
- Around line 1-29: The ota-advisory-prompt ConfigMap (metadata name:
ota-advisory-prompt, namespace: openshift-lightspeed) is missing required
cluster-profile annotations and a kubernetes.io/description; add the standard
include.release.openshift.io/* profile selectors (e.g.
include.release.openshift.io/self-managed-high-availability and other profiles
your repository requires) under metadata.annotations and add
kubernetes.io/description with a short purpose string describing this
ConfigMap’s role as the OTA advisory prompt used by Lightspeed agents; apply the
same annotation/description pattern to the related manifests that reference this
ConfigMap (the resources defined in 51_lightspeed-agents.yaml and
52_lightspeed-workflows.yaml) so the CVO graph selector will correctly install
these resources on profile-scoped clusters.

In `@install/0000_00_cluster-version-operator_51_lightspeed-agents.yaml`:
- Around line 11-20: Replace the mutable, external skill image reference
quay.io/harpatil/ocp-skills:latest with an immutable, release-managed image
(either a specific tag or an image digest) so the manifest is deterministic and
disconnected-cluster friendly; locate the skills entry that contains the image
key (the list under "skills" with image: quay.io/harpatil/ocp-skills:latest) and
update that image value to an approved, immutable release image (e.g., use your
release payload registry or a content-addressable digest) and ensure any
CI/release docs are updated to track the chosen immutable reference.
- Around line 4-7: This manifest for the Agent (metadata.name: ota-advisor) is
missing the required cluster-profile include annotations and a description; add
include.release.openshift.io/<profile-name>: "true" entries for every cluster
profile this feature should be included in (use the same profile keys your repo
uses, e.g., include.release.openshift.io/cluster: "true" or
include.release.openshift.io/optional: "true" as appropriate) alongside the
existing release.openshift.io/feature-gate annotation, and add a
kubernetes.io/description annotation that succinctly explains the Agent’s
purpose (e.g., "OTA Advisor Agent: collects and reports OTA proposal
recommendations for LightspeedProposals feature"). Ensure you update the
annotations block under metadata for the resource named ota-advisor.

In `@install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml`:
- Around line 4-7: The manifest for the Workflow resource named ota-advisory is
missing required install annotations and a description; update the
metadata.annotations to add the appropriate include.release.openshift.io
cluster-profile keys (e.g.,
include.release.openshift.io/self-managed-high-availability: "true" and any
necessary exclusion annotations consistent with
install/0000_00_cluster-version-operator_30_deployment.yaml and
_02_networkpolicy.yaml) and add kubernetes.io/description explaining the
resource's purpose; ensure you keep the existing
release.openshift.io/feature-gate: LightspeedProposals while adding the new
include.release.openshift.io/* entries and the kubernetes.io/description string.

In `@pkg/cvo/availableupdates.go`:
- Around line 186-190: The proposal creation is currently invoked inline
(optr.shouldCreateLightspeedProposals() ->
optr.maybeCreateLightspeedProposal(...)) which can block the available-updates
worker; change this to run asynchronously and bounded: when
shouldCreateLightspeedProposals() is true, spawn a goroutine to call
maybeCreateLightspeedProposal but pass a derived context with a short timeout
(context.WithTimeout) and use a global or package-level semaphore/channel to
bound concurrent proposal creations so you don’t spawn unlimited goroutines;
ensure maybeCreateLightspeedProposal observes the passed context, logs any
timeout/errors, and that the goroutine exits cleanly to avoid leaks so update
discovery remains non-blocking.

In `@pkg/cvo/cvo.go`:
- Around line 1219-1249: maybeCreateLightspeedProposal currently calls
readiness.RunAll with the caller's unbounded ctx which can block the
available-updates sync; modify maybeCreateLightspeedProposal to enforce an
overall deadline by creating a derived context with timeout (or spawn a
fire-and-forget goroutine using a context with timeout) before calling
readiness.RunAll, then call readiness.RunAll(ctxWithTimeout, dc,
optr.release.Version, targetVersion) and pass the resulting readinessJSON to
optr.proposalCreator.MaybeCreateProposal so the proposal creation cannot delay
the reconcile loop.

In `@pkg/proposal/proposal.go`:
- Around line 210-230: The loop that prints "Other recommended versions" should
operate on a precomputed filtered slice (e.g., others := filter updates where
u.Version != target) instead of iterating the raw updates so counting and
remaining calculations are correct; change the logic in the block that
references updates/target to first build others, only print the header when
len(others) > 0, iterate up to 5 entries from others, and compute remaining as
len(others)-count when truncating so the "... and N more" value is accurate
(update references to updates, target, count and remaining accordingly).
- Around line 136-153: The sort comparator for the candidates slice currently
violates strict weak ordering when versions are equal because it returns the
same result for both directions; update the sorting to use sort.SliceStable on
candidates and change the Less function to: first compare
candidates[i].version.Compare(candidates[j].version) as before; if equal, check
if candidates[i].kind != candidates[j].kind and prefer updateKindRecommended
when kinds differ; if both version and kind are equal, use candidates[i].raw <
candidates[j].raw as a deterministic tiebreaker; after this stable sort you can
simply return candidates[0].raw and candidates[0].kind as the best candidate.
- Around line 248-257: The sanitize function currently leaves characters like
'+', '/', '_' and other non-alphanumeric/non-dash characters untouched,
producing invalid DNS-1035 labels; update sanitize(s string) to first normalize
to lowercase, then replace any character that is not [a-z0-9-] (e.g., using a
regexp or rune check) with '-', keep the existing replacements for '.' and ' '
(or rely on the general rule), preserve the 20-char truncation behavior and the
TrimRight(s, "-") call so trailing dashes are removed, and preserve the current
behavior allowing digit-start labels (do not force an alphabetic prefix).

In `@pkg/readiness/client.go`:
- Around line 3-7: The helper that constructs the label selector string
currently iterates a map and emits unsorted parts (e.g., "a=1,b=2" vs
"b=2,a=1"); to make output deterministic, collect the map entries into a slice
of "key=value" pairs, sort that slice using key as primary and value as
tiebreaker (i.e., sort by key then by value), then join with commas to produce
the selector; apply the same stable-sorting fix to the other occurrence that
builds label selector parts so both places produce deterministic, test-stable
output.
- Around line 153-163: CompareVersions currently masks parse errors by returning
0, so change its signature to return (int, error) instead of just int: parse a
and b with semver.ParseTolerant, if either parse fails return 0 and the parse
error, otherwise return va.Compare(vb), nil; update callers (notably the code in
api_deprecations.go that calls CompareVersions) to handle the error path
explicitly (e.g., treat parse errors as invalid input and skip or surface an
error) so parse failures are distinguishable from genuine equality.

In `@pkg/readiness/crd_compat.go`:
- Around line 15-67: CRDCompatCheck.Run currently ignores its current/target
params and never uses sectionErrors; either make it target-aware or explicitly
document/rename and remove unused vars. Option A (preferred): when target !=
current, consult a removed-version lookup (e.g.,
RemovedCRDVersionsForTarget(target) or a map you add) and flag CRDs whose
storedVersions appear in that removed set (update versionIssues accordingly),
and remove the dead sectionErrors or populate it with real errors; reference
CRDCompatCheck.Run, current, target, versionIssues, sectionErrors. Option B: if
you only want cluster-state hygiene, add a doc comment to CRDCompatCheck.Run
and/or rename the check to crd_stored_versions, remove current/target parameters
and the unused sectionErrors variable so the signature and function behavior
match.

In `@pkg/readiness/etcd_health.go`:
- Around line 33-50: The code counts pods as healthy based on phase=="Running",
which overstates etcd health; change the readiness check to inspect the pod
"Ready" condition instead (e.g., look through pod.Object.status.conditions for a
condition with type "Ready" and status "True" and use that to set
ready/healthyMembers) while still populating result["members"],
result["healthy_members"], and result["total_members"] as before; also update
TestEtcdHealthCheck in pkg/readiness/checks_test.go to add Ready pod conditions
to the fixture pods so the test reflects the new readiness logic.

In `@pkg/readiness/node_capacity.go`:
- Line 10: The comment/docstring for NodeCapacityCheck is inaccurate: it says it
"assesses node readiness and resource headroom" but the code only reports
readiness and schedulability counts. Update the NodeCapacityCheck comment, any
kubernetes.io/description annotations and nearby doc strings to state that the
check reports node readiness and schedulability counts (e.g., "reports counts of
Ready and Schedulable nodes" or similar), removing the incorrect reference to
resource headroom so the description matches the implementation.

In `@pkg/readiness/olm_lifecycle.go`:
- Around line 149-167: The target compatibility logic in the loop (fields
compat, parsedTarget, parsedMax, parsedMin, entry["compatible_with_target"],
incompatibleWithTarget) only checks maxOCP; update it so when hasTarget is true
you evaluate both min and max bounds: parse minOCP (if non-empty) and set a
minOk flag based on parsedTarget.Compare(parsedMin) >= 0 (or true if no min),
parse maxOCP and set maxOk based on parsedTarget.Compare(parsedMax) <= 0 (or
true if no max), then set entry["compatible_with_target"] = minOk && maxOk and
increment incompatibleWithTarget only when that result is false. Ensure error
handling for semver.ParseTolerant leaves the corresponding bound as
not-applicable (treat as true) rather than skipping the combined check.
- Around line 64-65: The PackageManifest indexing currently uses indexByName
(pkgIndex := indexByName(pkgManifests)), which only keys by metadata.name and
can collide across catalogs; change the index to use a composite key consisting
of status.catalogSource + "|" + status.catalogSourceNamespace + "|" +
(status.packageName or metadata.name as a fallback) (either by replacing
indexByName with a new indexByPackageCompositeKey or extending it), and update
the Subscription lookup that builds the pkg key (the code that currently looks
up pkgIndex[...]) to construct the same composite from subscription.Spec.Source
+ "|" + subscription.Spec.SourceNamespace + "|" + (pkgStatusPackageName or
metadata.name fallback). Also update any other places using pkgIndex (the other
lookups noted in the comment) to use the composite-key logic so
channel/compatibility matching uses the correct CatalogSource.

In `@pkg/readiness/pdb_drain.go`:
- Around line 26-27: The code reads PDB thresholds using NestedString which
drops integer values; in pkg/readiness/pdb_drain.go replace the NestedString
calls for maxUnavailable and minAvailable with
unstructured.NestedFieldNoCopy(pdb.Object, "spec", "maxUnavailable") and
unstructured.NestedFieldNoCopy(pdb.Object, "spec", "minAvailable") to capture
the raw IntOrString values from pdb.Object, then convert those raw values to
strings only when building the output/issue map (or keep the raw types if
preferable) so numeric thresholds like 1 or 0 are preserved instead of becoming
empty strings.

---

Nitpick comments:
In `@pkg/cvo/cvo.go`:
- Around line 400-406: Store the dynamic client on the Operator (optr) instead
of relying on proposal.Creator.DynamicClient(): when creating the proposal
creator in the init block where proposal.NewCreator(dynamicClient, ...) is
called, assign dynamicClient to a new optr.dynamicClient field and pass that
same client into proposal.NewCreator; update maybeCreateLightspeedProposal and
readiness.RunAll callers to use optr.dynamicClient directly and remove the
DynamicClient() accessor from proposal.Creator so the proposal package no longer
exposes its internal client.

In `@pkg/readiness/api_deprecations.go`:
- Around line 21-30: Replace the fragile string check in the error handling
block inside the function that lists APIRequestCounts (the code checking if
strings.Contains(err.Error(), "not found")) with the typed Kubernetes error
check apierrors.IsNotFound(err); import "k8s.io/apimachinery/pkg/api/errors"
(alias apierrors if not already) and keep the existing behavior of returning the
warning map and nil error when the resource is absent, otherwise return the
wrapped fmt.Errorf as before.

In `@pkg/readiness/check_test.go`:
- Around line 150-224: The test TestRunAllMixedResults is not exercising the
real RunAll orchestration (it re-implements the logic and only toys with
AllChecks), so either make RunAll testable by accepting an injected []Check or
make AllChecks a package-level variable tests can override and then call RunAll
with []Check{okCheck, failCheck, partialCheck} and assert on the returned
*Output (including Meta.ChecksOK/Meta.ChecksErrored and per-check Data/Error
preservation), or remove the dead test; update the test to call RunAll (or
replace it) instead of manually iterating checks so timeouts, concurrency,
elapsed accounting and partial-data handling in RunAll are actually verified.

In `@pkg/readiness/client_test.go`:
- Around line 102-126: The "empty" subtest currently asserts nothing because
contains: []string{} makes the inner loop skip; update the test to use
strings.Contains(got, s) instead of the manual index walk and for the "empty"
case assert the exact expected output (e.g. for the test case with name "empty"
and labels map[string]string{} add an explicit check like if got != "" {
t.Errorf(...)}), or drop that case—locate the table-driven tests around
FormatLabelSelector in pkg/readiness/client_test.go and modify the test loop to
call strings.Contains and add the explicit equality assertion for the "empty"
entry.

In `@test/cvo/readiness.go`:
- Around line 20-74: Add Ginkgo labels to the suite (e.g., "TechPreview" and
"Serial" or the appropriate "CustomNoUpgrade"/"LightspeedProposals" labels) and
stop calling readiness.RunAll(...) inside every It; instead, run
readiness.RunAll(ctx, dynamicClient, currentVersion, targetVersion) once and
store its result in a suite-scoped variable (e.g., var output *readiness.Output)
populated in BeforeEach or a BeforeAll so each It reads from that cached output;
update all It blocks to reference the cached output and remove the repeated
RunAll calls, then run make update to refresh test metadata.
- Around line 192-194: The code does an unsafe type assertion on
result.Data["blocking_pdbs"] (and similarly result.Data["summary"]) which can
panic; change to a safe assertion using the comma-ok pattern (e.g., v, ok :=
result.Data["blocking_pdbs"]; cast, ok2 := v.([]map[string]any)) or a type
switch, assert with o.Expect that the value exists/is the expected type, and if
nil treat it as an empty slice/map before using len or indexing; update the
references to blockingPDBs and the summary extraction to use these safe checks
so tests fail with assertions instead of panicking.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 8966cf38-fdd4-4bf1-8ea3-9fcf63361158

📥 Commits

Reviewing files that changed from the base of the PR and between d3df6c0 and 1a82825.

⛔ Files ignored due to path filters (3)
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/openshift/api/features/features.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (27)
  • AGENTS.md
  • go.mod
  • install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml
  • install/0000_00_cluster-version-operator_51_lightspeed-agents.yaml
  • install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml
  • pkg/cvo/availableupdates.go
  • pkg/cvo/cvo.go
  • pkg/cvo/status_test.go
  • pkg/featuregates/featuregates.go
  • pkg/proposal/proposal.go
  • pkg/proposal/proposal_test.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/client_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/network.go
  • pkg/readiness/node_capacity.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/operator_health.go
  • pkg/readiness/pdb_drain.go
  • test/cvo/readiness.go

Comment thread AGENTS.md
Comment on lines +213 to +214
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor \
-o _output/linux/amd64/cluster-version-operator ./cmd/cluster-version-operator/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use the repository build target in the new workflow.

This section bypasses the supported CVO build path. Prefer make build, or explicitly document why this dev flow needs a raw cross-build.

Suggested doc adjustment
-CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor \
-  -o _output/linux/amd64/cluster-version-operator ./cmd/cluster-version-operator/
+make build

Based on learnings, “Use make build to build the CVO binary, outputs to _output/ directory”.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor \
-o _output/linux/amd64/cluster-version-operator ./cmd/cluster-version-operator/
make build
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 213 - 214, Replace the raw cross-build command for
the cluster-version-operator with the supported repository build target: remove
the CGO_ENABLED/GOOS/GOARCH go build line and instead instruct users to run the
repository's make build target (e.g., "make build") which produces the CVO
binary into the _output/ directory; if a raw cross-build must be kept, add an
explicit justification explaining why the dev flow cannot use "make build" and
document expected outputs and environment differences.

Comment thread AGENTS.md
Comment on lines +237 to +244
docker build --platform linux/amd64 -f Dockerfile.dev -t quay.io/harpatil/cvo-lightspeed:${TAG} .
docker push quay.io/harpatil/cvo-lightspeed:${TAG}

# 4. Deploy: single atomic patch (image + args + env) to avoid partial reverts
RELEASE_IMAGE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}')
API_HOST=$(oc get infrastructure cluster -o jsonpath='{.status.apiServerInternalURI}' | sed 's|https://||' | cut -d: -f1)
oc patch deployment cluster-version-operator -n openshift-cluster-version --type json -p "[
{\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"quay.io/harpatil/cvo-lightspeed:${TAG}\"},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid hard-coding a personal image repository in shared docs.

Use an IMAGE variable or placeholder so contributors do not copy a registry they cannot push to.

Suggested doc adjustment
 TAG="v$(date +%s)"
-docker build --platform linux/amd64 -f Dockerfile.dev -t quay.io/harpatil/cvo-lightspeed:${TAG} .
-docker push quay.io/harpatil/cvo-lightspeed:${TAG}
+IMAGE="quay.io/yourname/cvo-lightspeed:${TAG}"
+docker build --platform linux/amd64 -f Dockerfile.dev -t "${IMAGE}" .
+docker push "${IMAGE}"
@@
-  {\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"quay.io/harpatil/cvo-lightspeed:${TAG}\"},
+  {\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"${IMAGE}\"},
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
docker build --platform linux/amd64 -f Dockerfile.dev -t quay.io/harpatil/cvo-lightspeed:${TAG} .
docker push quay.io/harpatil/cvo-lightspeed:${TAG}
# 4. Deploy: single atomic patch (image + args + env) to avoid partial reverts
RELEASE_IMAGE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}')
API_HOST=$(oc get infrastructure cluster -o jsonpath='{.status.apiServerInternalURI}' | sed 's|https://||' | cut -d: -f1)
oc patch deployment cluster-version-operator -n openshift-cluster-version --type json -p "[
{\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"quay.io/harpatil/cvo-lightspeed:${TAG}\"},
IMAGE="quay.io/yourname/cvo-lightspeed:${TAG}"
docker build --platform linux/amd64 -f Dockerfile.dev -t "${IMAGE}" .
docker push "${IMAGE}"
# 4. Deploy: single atomic patch (image + args + env) to avoid partial reverts
RELEASE_IMAGE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}')
API_HOST=$(oc get infrastructure cluster -o jsonpath='{.status.apiServerInternalURI}' | sed 's|https://||' | cut -d: -f1)
oc patch deployment cluster-version-operator -n openshift-cluster-version --type json -p "[
{\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"${IMAGE}\"},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 237 - 244, Replace the hard-coded personal image
reference quay.io/harpatil/cvo-lightspeed:${TAG} with a configurable IMAGE
variable/placeholder used consistently across the build, push and oc patch
commands (the docker build, docker push lines and the value field in the oc
patch for deployment cluster-version-operator). Update the docs to reference
IMAGE or ${IMAGE} (and document that contributors should set IMAGE to their own
registry/tag) so the build/push and the patch operation use IMAGE instead of the
personal quay.io path.

Comment thread go.mod

replace github.com/onsi/ginkgo/v2 => github.com/openshift/onsi-ginkgo/v2 v2.6.1-0.20241205171354-8006f302fd12

replace github.com/openshift/api => github.com/harche/api v0.0.0-20260414192630-b7a8e3d157cb
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Show all OpenShift API imports and direct Lightspeed feature-gate usages affected by the replace.

rg -n -C2 'github\.com/openshift/api|FeatureGateLightspeedProposals|agentic\.openshift\.io'

Repository: openshift/cluster-version-operator

Length of output: 50391


🏁 Script executed:

# Find where FeatureGateLightspeedProposals is actually used in non-vendor code
rg -n 'FeatureGateLightspeedProposals' --type=go | grep -v '^vendor/' | head -20

Repository: openshift/cluster-version-operator

Length of output: 249


🏁 Script executed:

# Search for proposal-related unstructured CR creation patterns
rg -n 'Proposal|unstructured|agentic' --type=go pkg/ | head -30

Repository: openshift/cluster-version-operator

Length of output: 3414


🏁 Script executed:

# Check the featuregates.go file to understand dependencies
fd -n 'featuregates.go' -x cat -n {}

Repository: openshift/cluster-version-operator

Length of output: 308


🏁 Script executed:

# View featuregates.go around the usage lines
sed -n '120,145p' pkg/featuregates/featuregates.go

Repository: openshift/cluster-version-operator

Length of output: 934


🏁 Script executed:

# Search for proposal CR creation patterns or resources
rg -n 'ProposalRequest|proposals\.|agentic\.openshift' pkg/ --type=go

Repository: openshift/cluster-version-operator

Length of output: 1448


🏁 Script executed:

# Check the imports in featuregates.go
head -30 pkg/featuregates/featuregates.go

Repository: openshift/cluster-version-operator

Length of output: 1734


Avoid merging with a module-wide replace to a personal fork.

The replace directive redirects all github.com/openshift/api imports—including stable types like config/v1—to github.com/harche/api. However, analysis shows that only the FeatureGateLightspeedProposals feature gate constant from the fork is actually used (in pkg/featuregates/featuregates.go lines 126 and 138), while proposal CRs are created as unstructured objects (agentic.openshift.io/v1alpha1/Proposal), not typed structs from the API.

Remove this replace directive before merging, or upstream the feature gate symbol to the main openshift/api repository instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 112, The go.mod replace that points github.com/openshift/api
to your personal fork must be removed; edit go.mod to delete the line "replace
github.com/openshift/api => github.com/harche/api ..." and restore use of the
upstream module, then ensure the code referencing FeatureGateLightspeedProposals
in pkg/featuregates/featuregates.go either imports the upstream
github.com/openshift/api package (if the constant exists upstream) or you
instead vendor/define the specific constant locally or upstream the symbol to
the official repo before merging.

Comment on lines +1 to +29
---
apiVersion: v1
kind: ConfigMap
metadata:
name: ota-advisory-prompt
namespace: openshift-lightspeed
annotations:
release.openshift.io/feature-gate: LightspeedProposals
data:
prompt: |
You are an OpenShift upgrade advisor. Analyze the cluster readiness
data in the proposal request and produce an upgrade risk assessment.

The request contains a "Cluster Readiness Data" section with a JSON
block. This was collected by the Cluster Version Operator — do not
re-collect it. Parse the JSON, evaluate each check's results, and
classify findings as blockers, warnings, or informational.

Use the ota-upgrade-advisor skill for the decision framework and
blocker classification rules. When findings need deeper investigation,
use prometheus, platform-docs, redhat-support, or product-lifecycle
skills.

When the readiness data includes olm_operator_lifecycle results, use
the product-lifecycle skill to cross-reference each operator's package
name against the Red Hat Product Life Cycle API. Report support phase,
EOL dates, and OCP compatibility from PLCC alongside the OLM data.

Do not guess or assume cluster state. Do not execute upgrade commands.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm the sibling lightspeed manifests carry the expected profile annotations so this file isn't the only outlier.
fd -t f 'lightspeed' install
fd -t f 'lightspeed' install --exec rg -nH 'include.release.openshift.io|kubernetes.io/description|release.openshift.io/feature-gate' {}

Repository: openshift/cluster-version-operator

Length of output: 670


🏁 Script executed:

for file in install/0000_00_cluster-version-operator_5{0,1,2}_lightspeed-*.yaml; do
  if [ -f "$file" ]; then
    echo "=== $file ==="
    cat "$file"
    echo ""
  fi
done

Repository: openshift/cluster-version-operator

Length of output: 9278


Missing cluster-profile annotations and kubernetes.io/description.

All three related lightspeed manifest files (50_lightspeed-prompts.yaml, 51_lightspeed-agents.yaml, and 52_lightspeed-workflows.yaml) lack required annotations per coding guidelines for install/**:

  • "All manifests must have appropriate cluster-profile annotations (include.release.openshift.io/self-managed-high-availability, etc.)"
  • "Use kubernetes.io/description annotations to explain the resource's purpose"

Without cluster-profile annotations, the CVO manifest graph selector won't apply this ConfigMap on profile-scoped clusters, breaking the Agent manifest reference to ota-advisory-prompt at runtime.

🛡️ Suggested metadata block
 metadata:
   name: ota-advisory-prompt
   namespace: openshift-lightspeed
   annotations:
     release.openshift.io/feature-gate: LightspeedProposals
+    include.release.openshift.io/self-managed-high-availability: "true"
+    include.release.openshift.io/single-node-developer: "true"
+    kubernetes.io/description: |
+      System prompt used by the Lightspeed OTA advisory agent to analyze
+      Cluster Version Operator readiness data and classify upgrade findings
+      as blockers, warnings, or informational. Referenced by the Lightspeed
+      Agent via spec.systemPromptRef.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml` around
lines 1 - 29, The ota-advisory-prompt ConfigMap (metadata name:
ota-advisory-prompt, namespace: openshift-lightspeed) is missing required
cluster-profile annotations and a kubernetes.io/description; add the standard
include.release.openshift.io/* profile selectors (e.g.
include.release.openshift.io/self-managed-high-availability and other profiles
your repository requires) under metadata.annotations and add
kubernetes.io/description with a short purpose string describing this
ConfigMap’s role as the OTA advisory prompt used by Lightspeed agents; apply the
same annotation/description pattern to the related manifests that reference this
ConfigMap (the resources defined in 51_lightspeed-agents.yaml and
52_lightspeed-workflows.yaml) so the CVO graph selector will correctly install
these resources on profile-scoped clusters.

Comment on lines +4 to +7
metadata:
name: ota-advisor
annotations:
release.openshift.io/feature-gate: LightspeedProposals
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add the required release-profile and description annotations.

This manifest is feature-gated, but it still needs the appropriate include.release.openshift.io/* cluster-profile annotations and a kubernetes.io/description explaining the Agent’s purpose.

As per coding guidelines, “All manifests must have appropriate cluster-profile annotations” and “Use kubernetes.io/description annotations to explain the resource's purpose”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_00_cluster-version-operator_51_lightspeed-agents.yaml` around
lines 4 - 7, This manifest for the Agent (metadata.name: ota-advisor) is missing
the required cluster-profile include annotations and a description; add
include.release.openshift.io/<profile-name>: "true" entries for every cluster
profile this feature should be included in (use the same profile keys your repo
uses, e.g., include.release.openshift.io/cluster: "true" or
include.release.openshift.io/optional: "true" as appropriate) alongside the
existing release.openshift.io/feature-gate annotation, and add a
kubernetes.io/description annotation that succinctly explains the Agent’s
purpose (e.g., "OTA Advisor Agent: collects and reports OTA proposal
recommendations for LightspeedProposals feature"). Ensure you update the
annotations block under metadata for the resource named ota-advisor.

Comment on lines +33 to +50
podStatuses := make([]map[string]any, 0, len(etcdPods))
healthyMembers := 0
for _, pod := range etcdPods {
phase := NestedString(pod.Object, "status", "phase")
ready := phase == "Running"
if ready {
healthyMembers++
}
podStatuses = append(podStatuses, map[string]any{
"name": pod.GetName(),
"node": NestedString(pod.Object, "spec", "nodeName"),
"phase": phase,
"ready": ready,
})
}
result["members"] = podStatuses
result["healthy_members"] = healthyMembers
result["total_members"] = len(etcdPods)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

phase == "Running" is not the same as "healthy etcd member".

A pod in Running phase can still have unready containers (startup/readiness probe failing, member catching up, leader election, fsync stalls). Counting such a pod as a healthy etcd member over-reports healthy_members and may cause the Lightspeed advisor to classify an unsafe upgrade as safe.

Use the Ready pod condition instead (or in addition):

-		for _, pod := range etcdPods {
-			phase := NestedString(pod.Object, "status", "phase")
-			ready := phase == "Running"
-			if ready {
-				healthyMembers++
-			}
-			podStatuses = append(podStatuses, map[string]any{
-				"name":   pod.GetName(),
-				"node":   NestedString(pod.Object, "spec", "nodeName"),
-				"phase":  phase,
-				"ready":  ready,
-			})
-		}
+		for _, pod := range etcdPods {
+			phase := NestedString(pod.Object, "status", "phase")
+			ready := false
+			for _, c := range NestedSlice(pod.Object, "status", "conditions") {
+				cm, ok := c.(map[string]interface{})
+				if !ok {
+					continue
+				}
+				if NestedString(cm, "type") == "Ready" && NestedString(cm, "status") == string(ConditionTrue) {
+					ready = true
+					break
+				}
+			}
+			if ready {
+				healthyMembers++
+			}
+			podStatuses = append(podStatuses, map[string]any{
+				"name":  pod.GetName(),
+				"node":  NestedString(pod.Object, "spec", "nodeName"),
+				"phase": phase,
+				"ready": ready,
+			})
+		}

Note: this will require updating TestEtcdHealthCheck in pkg/readiness/checks_test.go to add Ready pod conditions to the fixture pods (currently only status.phase is set, so healthy_members would drop to 0).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
podStatuses := make([]map[string]any, 0, len(etcdPods))
healthyMembers := 0
for _, pod := range etcdPods {
phase := NestedString(pod.Object, "status", "phase")
ready := phase == "Running"
if ready {
healthyMembers++
}
podStatuses = append(podStatuses, map[string]any{
"name": pod.GetName(),
"node": NestedString(pod.Object, "spec", "nodeName"),
"phase": phase,
"ready": ready,
})
}
result["members"] = podStatuses
result["healthy_members"] = healthyMembers
result["total_members"] = len(etcdPods)
podStatuses := make([]map[string]any, 0, len(etcdPods))
healthyMembers := 0
for _, pod := range etcdPods {
phase := NestedString(pod.Object, "status", "phase")
ready := false
for _, c := range NestedSlice(pod.Object, "status", "conditions") {
cm, ok := c.(map[string]interface{})
if !ok {
continue
}
if NestedString(cm, "type") == "Ready" && NestedString(cm, "status") == string(ConditionTrue) {
ready = true
break
}
}
if ready {
healthyMembers++
}
podStatuses = append(podStatuses, map[string]any{
"name": pod.GetName(),
"node": NestedString(pod.Object, "spec", "nodeName"),
"phase": phase,
"ready": ready,
})
}
result["members"] = podStatuses
result["healthy_members"] = healthyMembers
result["total_members"] = len(etcdPods)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/etcd_health.go` around lines 33 - 50, The code counts pods as
healthy based on phase=="Running", which overstates etcd health; change the
readiness check to inspect the pod "Ready" condition instead (e.g., look through
pod.Object.status.conditions for a condition with type "Ready" and status "True"
and use that to set ready/healthyMembers) while still populating
result["members"], result["healthy_members"], and result["total_members"] as
before; also update TestEtcdHealthCheck in pkg/readiness/checks_test.go to add
Ready pod conditions to the fixture pods so the test reflects the new readiness
logic.

"k8s.io/client-go/dynamic"
)

// NodeCapacityCheck assesses node readiness and resource headroom.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep the check description aligned with its output.

This check does not calculate resource headroom; it reports readiness and schedulability counts.

Suggested wording
-// NodeCapacityCheck assesses node readiness and resource headroom.
+// NodeCapacityCheck assesses node readiness and schedulability.

As per coding guidelines, “When modifying code, check that nearby comments, kubernetes.io/description annotations, and doc strings still accurately describe the new behavior.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// NodeCapacityCheck assesses node readiness and resource headroom.
// NodeCapacityCheck assesses node readiness and schedulability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/node_capacity.go` at line 10, The comment/docstring for
NodeCapacityCheck is inaccurate: it says it "assesses node readiness and
resource headroom" but the code only reports readiness and schedulability
counts. Update the NodeCapacityCheck comment, any kubernetes.io/description
annotations and nearby doc strings to state that the check reports node
readiness and schedulability counts (e.g., "reports counts of Ready and
Schedulable nodes" or similar), removing the incorrect reference to resource
headroom so the description matches the implementation.

Comment on lines +64 to +65
csvIndex := indexByNamespacedName(csvs)
pkgIndex := indexByName(pkgManifests)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n pkg/readiness/olm_lifecycle.go | head -80

Repository: openshift/cluster-version-operator

Length of output: 3095


🏁 Script executed:

# Look at the lines mentioned in the review
sed -n '60,70p' pkg/readiness/olm_lifecycle.go
sed -n '140,150p' pkg/readiness/olm_lifecycle.go
sed -n '200,220p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 1533


🏁 Script executed:

# Find the indexByName function definition
rg -n "func indexByName" pkg/readiness/olm_lifecycle.go -A 10

Repository: openshift/cluster-version-operator

Length of output: 994


🏁 Script executed:

# Check if indexByName is used elsewhere
rg -n "indexByName" pkg/

Repository: openshift/cluster-version-operator

Length of output: 704


🏁 Script executed:

# Look at the Subscription and PackageManifest structures
rg -n "spec.*source" pkg/readiness/olm_lifecycle.go -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 369


🏁 Script executed:

# Check what fields are extracted from PackageManifests
rg -n "NestedString.*pm\|NestedSlice.*pm" pkg/readiness/olm_lifecycle.go -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Look at the context where pkgIndex is used (around line 142-145)
sed -n '135,155p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 792


🏁 Script executed:

# Check for OLM PackageManifest API specification or comments
rg -n "PackageManifest\|GVRPackageManifest" pkg/readiness/ -B 2 -A 2 | head -40

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Search for any documentation about PackageManifest fields
fd -e "*.go" -e "*.md" pkg/ | xargs rg -l "catalogSource\|packageName" 2>/dev/null | head -5

Repository: openshift/cluster-version-operator

Length of output: 399


🏁 Script executed:

# Look at the actual usage and what's being extracted from PackageManifest
sed -n '130,180p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 1616


🏁 Script executed:

# Check if there are any tests that show PackageManifest structure
find . -name "*test*.go" -type f | xargs grep -l "PackageManifest\|olm_lifecycle" 2>/dev/null | head -5

Repository: openshift/cluster-version-operator

Length of output: 146


🏁 Script executed:

# Look for any comments or documentation about PackageManifest fields
rg -n "status.*catalogSource\|catalogSourceNamespace\|packageName" pkg/readiness/ -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check the API resource definition
rg -n "GVRPackageManifest\|Package.*Manifest" pkg/readiness/ -B 1 -A 1

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Get more context on how Subscriptions reference PackageManifests
sed -n '78,95p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 708


🏁 Script executed:

# Check the test file to understand PackageManifest structure
cat -n pkg/readiness/olm_lifecycle_test.go | head -100

Repository: openshift/cluster-version-operator

Length of output: 4153


🏁 Script executed:

# Search for PackageManifest examples in tests
rg -n "PackageManifest\|packagemanifest" pkg/readiness/olm_lifecycle_test.go -A 5 -B 2 | head -80

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check what GVRPackageManifest is
rg -n "GVRPackageManifest\|GVR" pkg/readiness/ -B 1 -A 1 | grep -A 1 -B 1 "PackageManifest"

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Look at the resource definitions
fd -name "*.go" pkg/readiness | xargs grep -n "GVR.*=" | head -20

Repository: openshift/cluster-version-operator

Length of output: 308


🏁 Script executed:

# Get the resource definitions
grep -rn "GVR.*=" pkg/readiness/ | head -30

Repository: openshift/cluster-version-operator

Length of output: 3385


🏁 Script executed:

# Look for more context about how multiple catalogs work
rg -n "source\|catalog" pkg/readiness/olm_lifecycle_test.go -B 2 -A 2 | head -60

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check the full test to see if it covers multiple catalog sources
cat -n pkg/readiness/olm_lifecycle_test.go | tail -n +100 | head -100

Repository: openshift/cluster-version-operator

Length of output: 4162


🏁 Script executed:

# Check if there are any other test cases with multiple catalogs or sources
cat -n pkg/readiness/olm_lifecycle_test.go | tail -n +140

Repository: openshift/cluster-version-operator

Length of output: 11741


🏁 Script executed:

# Search for any documentation or comments about catalog sources
rg -n "catalog\|Catalog" pkg/readiness/ -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check if PackageManifest has status fields mentioned in the proposed fix
rg -n "catalogSource\|packageName" pkg/readiness/

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Look for OLM API documentation or references
rg -n "operators.coreos.com\|packages.operators.coreos" pkg/ -B 1 -A 1 | head -40

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Search for more complete PackageManifest examples in tests
cat -n pkg/readiness/olm_lifecycle_test.go | grep -A 20 "PackageManifest"

Repository: openshift/cluster-version-operator

Length of output: 1658


🏁 Script executed:

# Check if there are any comments or documentation about PackageManifest status fields
rg -n "status\|channel\|manifest" pkg/readiness/olm_lifecycle.go | grep -i "package\|manifest" | head -20

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Look at the extractOCPCompat function to understand PackageManifest structure better
sed -n '213,260p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 1460


🏁 Script executed:

# Check if the proposed fields (status.catalogSource) are actually used in the codebase
rg -rn "catalogSource\|packageName" pkg/

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check OLM API documentation online - search for PackageManifest API structure
# First, let's look for any imports or references to OLM packages
rg -n "operators.coreos.com\|packages.operators.coreos" pkg/readiness/ -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Look at the entire olm_lifecycle.go file to understand the full context
wc -l pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 112


🏁 Script executed:

# Check the key lines in detail - around 142-145 where pkgIndex is used
sed -n '135,175p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 1235


🏁 Script executed:

# Look for any documentation or comments about how PackageManifest maps to Subscription
rg -n "Subscription\|PackageManifest" pkg/readiness/olm_lifecycle.go | head -20

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check if there are any other files that document PackageManifest structure
find . -type f \( -name "*.md" -o -name "*.txt" \) | xargs grep -l "PackageManifest\|catalogSource" 2>/dev/null | head -10

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Look for any OLM-related documentation or vendor code
find . -path "*/vendor/*" -prune -o -type f -name "*.go" -print | xargs grep -l "packagemanifests\|PackageManifest" 2>/dev/null | head -10

Repository: openshift/cluster-version-operator

Length of output: 205


🏁 Script executed:

# Check the complete file to see if there's any comment about the correlation logic
sed -n '1,50p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 1642


🏁 Script executed:

# Check if there's any real-world scenario where package names collide
rg -n "unique\|collid\|duplicate\|conflict" pkg/readiness/olm_lifecycle.go -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Look for any issue or bug tracker references
rg -n "TODO\|FIXME\|BUG\|XXX\|HACK" pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Let's check the OLM API specification to verify PackageManifest has catalogSource fields
# First, check if there are any comments or documentation about this
cat -n pkg/readiness/olm_lifecycle.go | sed -n '20,30p'

Repository: openshift/cluster-version-operator

Length of output: 844


🏁 Script executed:

# Look at the complete logic to understand what the code is trying to do
sed -n '78,120p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 1618


🏁 Script executed:

# Check if there are integration tests or examples that show real PackageManifest structure
find . -type f -name "*.yaml" -o -name "*.json" | xargs grep -l "PackageManifest" 2>/dev/null | head -5

Repository: openshift/cluster-version-operator

Length of output: 60


🌐 Web query:

OLM PackageManifest API spec catalogSource sourceNamespace

💡 Result:

The PackageManifest is a Kubernetes Custom Resource Definition (CRD) in the OLM (Operator Lifecycle Manager) ecosystem, part of the OperatorHub APIs. It provides metadata about an Operator package, including available channels and versions from a specific catalog source. It is typically created and managed by OLM when a CatalogSource is deployed. API Group/Version: packages.operators.coreos.com/v1 Key fields in status (PackageManifestSpec is typically empty as it's status-driven): - catalogSource: string - Name of the CatalogSource this package belongs to (e.g., "certified-operators"). - catalogSourceNamespace: string - Namespace of the owning CatalogSource (typically "olm" or "openshift-marketplace"). - catalogSourcePublisher: string - Publisher of the catalog (e.g., "Red Hat Operators"). - catalogSourceDisplayName: string - Display name of the catalog. - packageName: string - Name of the Operator package (e.g., "etcd"). - defaultChannel: string - Default channel for installation (e.g., "stable"). - channels: array of objects - Each channel has name, currentCSV, entries (listing CSVs/versions with upgrade paths). PackageManifests are usually namespaced (in the CatalogSource's namespace) and queried via kubectl get packagemanifest -n olm. They power the OperatorHub UI and Subscription creation, where Subscriptions reference source (catalogSource name) and sourceNamespace. Note: PackageManifest format is legacy; modern OLM prefers bundle format, but PackageManifest CRs still exist for compatibility. Example usage in Subscription: spec: source: my-catalog sourceNamespace: olm

Citations:


🌐 Web query:

operators.coreos.com PackageManifest status fields

💡 Result:

The PackageManifest resource in the packages.operators.coreos.com/v1 API has a status object (PackageManifestStatus) with the following top-level fields: - catalogSource (string, required): Name of the CatalogSource this package belongs to. - catalogSourceDisplayName (string, required): Display name of the CatalogSource. - catalogSourceNamespace (string, required): Namespace of the owning CatalogSource. - catalogSourcePublisher (string, required): Publisher of the CatalogSource. - packageName (string, required): Name of the overall package (e.g., "etcd"). - channels (array of objects, required): Declared channels for the package (e.g., "stable", "alpha"). Each channel object includes: - name (string, required): Channel name. - currentCSV (string, required): Reference to the current ClusterServiceVersion (CSV) for the channel. - currentCSVDesc (object): Description of the current CSV, including displayName, description, version, icon, keywords, maintainers, links, relatedImages, installModes, maturity, minKubeVersion, etc. - entries (array of objects): List of all CSVs in the channel with upgrade paths; each entry has name, version, replaces, skipRange, and optional deprecation. - deprecation (object): Deprecation info for the channel. - defaultChannel (string, required): Default channel name. - provider (object): Provider information with name and URL. - deprecation (object): Top-level deprecation info for the package, including message and time. The spec is typically empty or minimal, as PackageManifests are primarily status-driven by Operator Lifecycle Manager (OLM) from CatalogSources. This schema is consistent across OpenShift/OKD 4.12-4.17 docs.

Citations:


🏁 Script executed:

# Verify that PackageManifests are indeed cluster-scoped (no namespace)
# and would need the status fields for proper correlation
sed -n '49,65p' pkg/readiness/olm_lifecycle.go

Repository: openshift/cluster-version-operator

Length of output: 515


🏁 Script executed:

# Check how ListResources works for PackageManifests
rg -n "ListResources\|GVRPackageManifest" pkg/readiness/client.go -A 5 -B 2

Repository: openshift/cluster-version-operator

Length of output: 60


Use composite key (source + sourceNamespace + packageName) when indexing PackageManifests.

Package names are not globally unique across catalogs. The current code indexes PackageManifests by metadata.name only, but Subscriptions specify both spec.source and spec.sourceNamespace to identify which catalog they depend on. Without matching on these fields, the code may attach compatibility and channel data from the wrong catalog if the same package name appears in multiple CatalogSources.

Change the PackageManifest index to use a composite key built from status.catalogSource, status.catalogSourceNamespace, and status.packageName (or metadata.name as fallback). Update the lookup at line 143 to construct the same key from the Subscription's source and sourceNamespace.

Also applies to: 142-145, 204-211

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/olm_lifecycle.go` around lines 64 - 65, The PackageManifest
indexing currently uses indexByName (pkgIndex := indexByName(pkgManifests)),
which only keys by metadata.name and can collide across catalogs; change the
index to use a composite key consisting of status.catalogSource + "|" +
status.catalogSourceNamespace + "|" + (status.packageName or metadata.name as a
fallback) (either by replacing indexByName with a new indexByPackageCompositeKey
or extending it), and update the Subscription lookup that builds the pkg key
(the code that currently looks up pkgIndex[...]) to construct the same composite
from subscription.Spec.Source + "|" + subscription.Spec.SourceNamespace + "|" +
(pkgStatusPackageName or metadata.name fallback). Also update any other places
using pkgIndex (the other lookups noted in the comment) to use the composite-key
logic so channel/compatibility matching uses the correct CatalogSource.

Comment thread pkg/readiness/olm_lifecycle.go
Comment on lines +26 to +27
maxUnavailable := NestedString(pdb.Object, "spec", "maxUnavailable")
minAvailable := NestedString(pdb.Object, "spec", "minAvailable")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Kubernetes policy/v1 PodDisruptionBudgetSpec minAvailable maxUnavailable IntOrString

💡 Result:

In Kubernetes policy/v1 PodDisruptionBudgetSpec, both minAvailable and maxUnavailable are fields of type IntOrString. They are mutually exclusive; only one can be specified in a single PodDisruptionBudget. IntOrString is a type that can hold either an int32 (absolute number of pods, e.g., 2) or a string (percentage, e.g., "50%"). When serialized to JSON or YAML, it represents the inner type directly. - minAvailable (IntOrString): Specifies the minimum number of pods that must remain available after an eviction. For example, minAvailable: 2 or minAvailable: "50%" (at least 50% available). - maxUnavailable (IntOrString): Specifies the maximum number of pods that can be unavailable after an eviction. For example, maxUnavailable: 1 or maxUnavailable: "20%" (at most 20% unavailable). Available since Kubernetes 1.7. Examples: apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: example-min spec: minAvailable: 2 # or "90%" selector: matchLabels: app: example --- apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: example-max spec: maxUnavailable: 1 # or "10%" selector: matchLabels: app: example Setting both fields results in validation error: "minAvailable and maxUnavailable cannot be both set".

Citations:


🏁 Script executed:

# First, find and examine the NestedString function definition
rg -n "func NestedString" --type go

Repository: openshift/cluster-version-operator

Length of output: 678


🏁 Script executed:

# Then check the actual code at the file and lines mentioned
cat -n pkg/readiness/pdb_drain.go | head -40

Repository: openshift/cluster-version-operator

Length of output: 1716


🏁 Script executed:

# Look for imports and context around NestedString usage
rg -n "NestedString" pkg/readiness/pdb_drain.go -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 370


🏁 Script executed:

# Check the NestedString implementation in the codebase (pkg/readiness/client.go)
cat -n pkg/readiness/client.go | sed -n '100,130p'

Repository: openshift/cluster-version-operator

Length of output: 1173


🏁 Script executed:

# Also check what other NestedInt64 looks like to understand the pattern
rg -n "func NestedInt64" pkg/readiness/client.go -A 10

Repository: openshift/cluster-version-operator

Length of output: 479


🏁 Script executed:

# Check the kubernetes unstructured NestedString implementation
cat -n vendor/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured/helpers.go | sed -n '75,95p'

Repository: openshift/cluster-version-operator

Length of output: 948


🏁 Script executed:

# Also check NestedFieldNoCopy to understand what the proposed fix does
rg -n "func NestedFieldNoCopy" vendor/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured/helpers.go -A 15

Repository: openshift/cluster-version-operator

Length of output: 607


🏁 Script executed:

# Let's verify how IntOrString values appear when unstructured
# Check if there are any tests or examples showing how PDB objects are structured
rg -n "IntOrString\|minAvailable\|maxUnavailable" pkg/readiness/ --type go

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check the complete context of how maxUnavailable and minAvailable are used in the output
cat -n pkg/readiness/pdb_drain.go | sed -n '24,50p'

Repository: openshift/cluster-version-operator

Length of output: 1267


🏁 Script executed:

# Let's understand how the IntOrString values are typically serialized/deserialized
# Check if there are any tests in the repository that show PDB structures
find . -name "*test*.go" -type f | xargs grep -l "PodDisruptionBudget\|pdb" | head -5

Repository: openshift/cluster-version-operator

Length of output: 169


🏁 Script executed:

# Also check if there's documentation or examples of how PDB objects are handled
rg -n "Object\[" pkg/readiness/pdb_drain.go -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check the test file for PDB-related tests
rg -n "PodDisruptionBudget\|pdb" ./pkg/readiness/check_test.go -B 3 -A 3

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Also search for any YAML examples or test fixtures that show IntOrString values
find . -type f \( -name "*.yaml" -o -name "*.yml" \) | xargs grep -l "minAvailable\|maxUnavailable" 2>/dev/null | head -3

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check the imports at the top of pdb_drain.go
head -20 pkg/readiness/pdb_drain.go

Repository: openshift/cluster-version-operator

Length of output: 612


🏁 Script executed:

# Check if unstructured is already imported elsewhere in the client.go file
head -30 pkg/readiness/client.go

Repository: openshift/cluster-version-operator

Length of output: 2309


🏁 Script executed:

# Let's verify what happens when NestedString encounters an integer value
# by checking if there are type conversions or coercions happening elsewhere
rg -n "fmt.Sprint\|fmt.Sprintf\|strconv" pkg/readiness/pdb_drain.go

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check if maxUnavailable and minAvailable are used elsewhere or if this is the only place
rg -n "max_unavailable\|min_available" pkg/readiness/ --type go

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Let's also check if the proposed import is already available
rg -n "unstructured" pkg/readiness/pdb_drain.go

Repository: openshift/cluster-version-operator

Length of output: 60


Preserve numeric PDB threshold values.

minAvailable and maxUnavailable are Kubernetes IntOrString fields that can be either integers or percentage strings. Using NestedString on an integer value causes a type assertion to fail, returning an empty string instead. PDBs with minAvailable: 1 or maxUnavailable: 0 will show empty threshold fields in the proposal data.

Use unstructured.NestedFieldNoCopy to capture the raw value and preserve both numeric and string formats:

Fix
+	"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
-		maxUnavailable := NestedString(pdb.Object, "spec", "maxUnavailable")
-		minAvailable := NestedString(pdb.Object, "spec", "minAvailable")
+		maxUnavailable, _, _ := unstructured.NestedFieldNoCopy(pdb.Object, "spec", "maxUnavailable")
+		minAvailable, _, _ := unstructured.NestedFieldNoCopy(pdb.Object, "spec", "minAvailable")

Then cast to string when outputting if needed, or use the raw values in the issues map.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/pdb_drain.go` around lines 26 - 27, The code reads PDB
thresholds using NestedString which drops integer values; in
pkg/readiness/pdb_drain.go replace the NestedString calls for maxUnavailable and
minAvailable with unstructured.NestedFieldNoCopy(pdb.Object, "spec",
"maxUnavailable") and unstructured.NestedFieldNoCopy(pdb.Object, "spec",
"minAvailable") to capture the raw IntOrString values from pdb.Object, then
convert those raw values to strings only when building the output/issue map (or
keep the raw types if preferable) so numeric thresholds like 1 or 0 are
preserved instead of becoming empty strings.

@harche harche force-pushed the lightspeed-proposals branch from 1a82825 to 38fbcfb Compare April 21, 2026 19:40
@wking wking changed the title WIP: pkg/cvo: Add Lightspeed proposal integration (OCPSTRAT-2618) OTA-1965: WIP: pkg/cvo: Add Lightspeed proposal integration Apr 21, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 21, 2026

@harche: This pull request references OTA-1965 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Adds LightspeedProposals feature gate integration — when enabled, CVO creates Proposal CRs (agentic.openshift.io/v1alpha1) after discovering available updates
  • New pkg/readiness/ package runs 9 parallel cluster readiness checks (operator health, etcd, nodes, PDBs, API deprecations, CRD compat, network, OLM lifecycle, cluster conditions) and embeds the results in the proposal
  • New pkg/proposal/ package handles proposal creation with deterministic naming and AlreadyExists-based dedup
  • Feature-gated install manifests for Agent, Workflow, and system prompt ConfigMap (annotated with release.openshift.io/feature-gate: LightspeedProposals)
  • Gated behind CustomNoUpgrade feature set — no impact on default clusters

Note: openshift/api dependency uses a fork (harche/api) with the LightspeedProposals feature gate definition until the upstream PR lands.

Test plan

  • Unit tests for pkg/proposal/ (SelectTarget, dedup, buildRequest, classifyUpdate, sanitize, readSystemPrompt)
  • Unit tests for pkg/readiness/ (all 9 checks, RunAll orchestration, JSON marshaling, client helpers)
  • Unit tests for pkg/featuregates/ (gate parsing, feature change stopper)
  • Integration test in test/cvo/readiness.go (validates checks against real cluster with ground-truth comparisons)
  • E2E verified on 4.21.5 cluster: readiness checks 9/9 ok, proposal created, dedup works
  • CI validation

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

  • Added Lightspeed Proposals capability enabling the cluster version operator to generate upgrade proposals with readiness assessment when updates are discovered.

  • Introduced comprehensive cluster readiness checks evaluating node capacity, operator health, network configuration, OLM lifecycle status, and API deprecations to support upgrade decisions.

  • Documentation

  • Updated developer documentation with Lightspeed proposals configuration, deployment procedures, and operational constraints.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.


// Create a LightspeedProposal if the feature is enabled and updates are available.
// This is best-effort and must never block update discovery.
if optr.shouldCreateLightspeedProposals() {
Copy link
Copy Markdown
Member

@hongkailiu hongkailiu Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of making another queue (thus the handling/consumer will be in another goroutine) which makes it look like a controller.

optr.proposalController.Queue().Add(optr.proposalController.QueueKey())

The details are:

79d8370#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R186

I did not know you are working on it already. 😄
WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it seems like there is some misunderstanding, if you are already working on it please continue on that path, I can close this PR linking yours.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 21, 2026
hongkailiu added a commit to hongkailiu/cluster-version-operator that referenced this pull request Apr 21, 2026
```
### fetch openshift#1379
$ git add install/0000_00_cluster-version-operator_4*
```
@harche harche force-pushed the lightspeed-proposals branch from 38fbcfb to f5cc163 Compare April 22, 2026 11:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (3)
pkg/readiness/node_capacity.go (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Keep the check description aligned with its output.

This check reports node readiness and schedulability counts; it does not calculate resource headroom.

Proposed wording
-// NodeCapacityCheck assesses node readiness and resource headroom.
+// NodeCapacityCheck assesses node readiness and schedulability.

As per coding guidelines, “When modifying code, check that nearby comments, kubernetes.io/description annotations, and doc strings still accurately describe the new behavior.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/node_capacity.go` at line 10, The comment for NodeCapacityCheck
is inaccurate—update the docstring/comment and any kubernetes.io/description
annotations near the NodeCapacityCheck function to reflect that it reports node
readiness and schedulability counts (not resource headroom); locate the
NodeCapacityCheck declaration in node_capacity.go and change the description
text to explicitly say it assesses node readiness and counts
schedulable/unschedulable nodes so the wording matches the actual output.
pkg/cvo/cvo.go (1)

775-779: ⚠️ Potential issue | 🟠 Major

Keep the proposal path from blocking sync.

This path is described as best-effort and non-blocking, but handleProposalAnnotation synchronously runs readiness checks downstream. Add an overall deadline at minimum, or enqueue this on a separate worker so proposal generation cannot delay status reconciliation.

Minimum bounded-call fix
 	if optr.shouldCreateLightspeedProposals() {
-		optr.handleProposalAnnotation(ctx, original)
+		proposalCtx, cancel := context.WithTimeout(ctx, 2*time.Minute)
+		defer cancel()
+		optr.handleProposalAnnotation(proposalCtx, original)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cvo/cvo.go` around lines 775 - 779, The proposal path currently calls
optr.handleProposalAnnotation synchronously from the sync path (guarded by
optr.shouldCreateLightspeedProposals()) which can block reconciliation; change
this to a non-blocking bounded call by creating a short child context with a
timeout (e.g., context.WithTimeout) and invoking handleProposalAnnotation in a
separate goroutine (or enqueue the request to an existing worker queue) so the
main sync returns immediately; ensure you pass the timeout-bound ctx into
handleProposalAnnotation and log any timeout/error outcomes instead of letting
it delay status reconciliation.
pkg/readiness/pdb_drain.go (1)

26-27: ⚠️ Potential issue | 🟡 Minor

Preserve numeric PDB threshold values.

minAvailable and maxUnavailable can be integer or percentage values. Reading them with NestedString loses integer thresholds, so proposal readiness data can omit the value that is actually blocking drains.

Proposed fix
 import (
 	"context"
 	"fmt"
 
+	"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
 	"k8s.io/client-go/dynamic"
 )
@@
-		maxUnavailable := NestedString(pdb.Object, "spec", "maxUnavailable")
-		minAvailable := NestedString(pdb.Object, "spec", "minAvailable")
+		maxUnavailable, _, _ := unstructured.NestedFieldNoCopy(pdb.Object, "spec", "maxUnavailable")
+		minAvailable, _, _ := unstructured.NestedFieldNoCopy(pdb.Object, "spec", "minAvailable")

Read-only verification:

#!/bin/bash
# Verify current PDB threshold extraction and helper behavior.
rg -n -C3 'maxUnavailable|minAvailable|func NestedString' pkg/readiness

Also applies to: 34-42

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/pdb_drain.go` around lines 26 - 27, The code currently calls
NestedString(pdb.Object, "spec", "maxUnavailable") and NestedString(...
"minAvailable") which coerces numeric PDB thresholds to strings and loses
integer values; change the extraction to fetch the raw value (use
unstructured.NestedFieldCopy or NestedFieldNoCopy on pdb.Object with
"spec","maxUnavailable" and "spec","minAvailable"), then type-switch the
returned interface{} to handle integers (int/int64/float64) and strings,
converting numbers to their exact string representation only when needed so
numeric thresholds are preserved; update any downstream usage that assumes
string to accept the normalized string produced by your type-switch.
🧹 Nitpick comments (2)
pkg/proposal/proposal_test.go (1)

139-161: LGTM on coverage.

Table-driven tests cover the z-stream / minor / unknown cases and unparseable inputs. One naming nit: the "major" case (Line 148) actually exercises the minor-bump branch — consider renaming to something like "major-bump-classified-as-minor" to make the intent obvious, since classifyUpdate deliberately doesn’t distinguish major bumps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/proposal/proposal_test.go` around lines 139 - 161, Rename the misleading
test case name inside TestClassifyUpdate so it reflects the behavior being
tested: change the struct entry with name "major" (which calls
classifyUpdate("4.15.1","5.0.0")) to a clearer label such as
"major-bump-classified-as-minor" (keeping the expected value "minor"); this
makes the intent explicit while leaving the classifyUpdate call and expected
assertion unchanged.
pkg/proposal/proposal.go (1)

275-283: Consider making maxAttempts configurable.

maxAttempts: int64(2) is hard-coded. The CRD schema supports 0–20 and Config already plumbs other tunables (Namespace, Workflow, PromptConfigMap); exposing MaxAttempts (with the current 2 as default) avoids future code churn and keeps behavior consistent with how other knobs are surfaced.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/proposal/proposal.go` around lines 275 - 283, Add a MaxAttempts
configuration knob and use it instead of the hard-coded int64(2): add a
MaxAttempts int (default 2) to the Config type that c.config references,
validate/clip it to the CRD-supported range 0–20 when the config is constructed
or loaded, and replace the literal int64(2) in the proposal spec construction
(where "spec" -> "maxAttempts" is set) with int64(c.config.MaxAttempts) so the
value is configurable and bounded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml`:
- Around line 44-47: The CRD's openAPIV3Schema.description claims "Proposal is
cluster-scoped" but the CRD sets scope: Namespaced; fix by either changing
scope: Namespaced to scope: Cluster in
install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml or by
updating openAPIV3Schema.description (and any kubernetes.io/description/doc
comments) to accurately state it is namespaced; then ensure this choice matches
the runtime usage in pkg/proposal/proposal.go (the Create(...) call) and update
nearby comments/annotations to be consistent with the selected scope.
- Around line 6-10: The CRD manifest metadata for proposals.agentic.openshift.io
is missing the cluster-profile annotations used by CVO; update the
metadata.annotations block in
install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml to add
the same include.release.openshift.io/* profile keys used across other install
manifests (for example
include.release.openshift.io/self-managed-high-availability: "true" and any
other profiles your repo requires) so the CRD is rendered under the appropriate
cluster profiles when the feature gate (release.openshift.io/feature-gate:
LightspeedProposals) is enabled.

In `@install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml`:
- Around line 6-10: Add the required release profile annotations and a
description to the CRD metadata: inside the metadata.annotations block for the
resource named agents.agentic.openshift.io, add the
include.release.openshift.io/* profile annotations (e.g.,
include.release.openshift.io/approved: "true" and the appropriate profile keys
used across manifests such as include.release.openshift.io/optional,
include.release.openshift.io/cluster or similar per your repo conventions) and
add a kubernetes.io/description annotation that succinctly explains this is a
temporary Agent CRD used for LightspeedProposals and its intended lifecycle;
update the annotations list under metadata.annotations in the manifest where
name: agents.agentic.openshift.io appears.
- Around line 98-115: The CRD currently allows invalid Agent specs because
llmRef.name is not required and outputFields vs rawOutputSchema mutual exclusion
is not enforced; update the Agent CRD OpenAPI schema: mark llmRef.name as
required (add "required: [\"name\"]" under llmRef properties or add it to the
containing object) and add a validation that enforces mutual exclusivity between
spec.outputFields and spec.rawOutputSchema (use oneOf/anyOf with schema
predicates or an x-kubernetes-validation rule that checks exactly one is set),
applying the same changes to the other Agent-like schema blocks referenced
(lines ~213-447, 518-521) so invalid Agents are rejected at admission.

In `@install/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yaml`:
- Around line 6-10: The manifest for the CRD named
workflows.agentic.openshift.io is missing required cluster-profile and
description annotations; add appropriate include.release.openshift.io/* profile
annotations (e.g., include.release.openshift.io/accept-never,
include.release.openshift.io/optional, or the correct profiles your repo uses)
under metadata.annotations and add a kubernetes.io/description annotation that
briefly explains this is a temporary Workflow CRD for LightspeedProposals and
its intended scope/purpose so the resource has both release-profile and
descriptive metadata.
- Around line 95-119: The CRD descriptions state agentRef must be omitted or nil
when skip is true but the validation only requires agentRef.name when skip is
false; add a CEL validation to enforce the contract (e.g., a rule that when skip
== true then agentRef is null/absent or agentRef.name is empty, and when skip ==
false agentRef.name must be present) for the step object containing the skip and
agentRef fields (referencing agentRef, agentRef.name and skip) or alternatively
relax the description to match existing CEL rules—apply the same change to the
other step schemas called out (lines 127-151, 159-183, 189-198).

In `@pkg/cvo/cvo.go`:
- Around line 1275-1286: The code uses optr.release.Version for readiness output
and proposal creation which can miss merged fallback metadata; call and use the
merged currentVersion() result instead: obtain v := optr.currentVersion() (or
however currentVersion() is exposed), pass v into readiness.RunAll(ctx, dc, v,
targetVersion) and into optr.proposalCreator.MaybeCreateProposal(ctx, v,
targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON), and
remove references to optr.release.Version so readinessJSON and proposal metadata
consistently use the merged version.

In `@pkg/readiness/network.go`:
- Around line 33-43: The readiness checker currently embeds raw proxy config
into result by reading GetResource(ctx, dc, GVRProxy, "cluster") and putting
NestedString(proxy.Object, "spec", "httpProxy"/"httpsProxy"/"noProxy") into
result; change this to avoid exposing credentials/domains by replacing the raw
strings with redacted summaries or booleans (e.g., "http_proxy": {"present":
true, "redacted": true, "length": N} or simply {"present": true}) and/or
sanitized metadata (e.g., host-only or masked value), update the code that sets
result["proxy"] (and any callers expecting these keys) accordingly, and keep
error handling via SectionError unchanged; use the proxy variable, GVRProxy,
GetResource and NestedString locations to find the block to modify.

In `@test/cvo/readiness.go`:
- Line 20: The g.Describe block for the cluster-version-operator readiness E2E
test is missing Ginkgo labels; update the g.Describe call (the Describe defined
as var _ = g.Describe(`[Jira:"Cluster Version Operator"]
cluster-version-operator readiness checks`, func() { ... })) to include g.Label
calls for CI categorization (for example add g.Label("Serial") and any relevant
ticket/feature label like g.Label("OTA-1813") or a TechPreview label)
immediately inside the Describe block so the suite is selectable/filterable by
CI tooling.

---

Duplicate comments:
In `@pkg/cvo/cvo.go`:
- Around line 775-779: The proposal path currently calls
optr.handleProposalAnnotation synchronously from the sync path (guarded by
optr.shouldCreateLightspeedProposals()) which can block reconciliation; change
this to a non-blocking bounded call by creating a short child context with a
timeout (e.g., context.WithTimeout) and invoking handleProposalAnnotation in a
separate goroutine (or enqueue the request to an existing worker queue) so the
main sync returns immediately; ensure you pass the timeout-bound ctx into
handleProposalAnnotation and log any timeout/error outcomes instead of letting
it delay status reconciliation.

In `@pkg/readiness/node_capacity.go`:
- Line 10: The comment for NodeCapacityCheck is inaccurate—update the
docstring/comment and any kubernetes.io/description annotations near the
NodeCapacityCheck function to reflect that it reports node readiness and
schedulability counts (not resource headroom); locate the NodeCapacityCheck
declaration in node_capacity.go and change the description text to explicitly
say it assesses node readiness and counts schedulable/unschedulable nodes so the
wording matches the actual output.

In `@pkg/readiness/pdb_drain.go`:
- Around line 26-27: The code currently calls NestedString(pdb.Object, "spec",
"maxUnavailable") and NestedString(... "minAvailable") which coerces numeric PDB
thresholds to strings and loses integer values; change the extraction to fetch
the raw value (use unstructured.NestedFieldCopy or NestedFieldNoCopy on
pdb.Object with "spec","maxUnavailable" and "spec","minAvailable"), then
type-switch the returned interface{} to handle integers (int/int64/float64) and
strings, converting numbers to their exact string representation only when
needed so numeric thresholds are preserved; update any downstream usage that
assumes string to accept the normalized string produced by your type-switch.

---

Nitpick comments:
In `@pkg/proposal/proposal_test.go`:
- Around line 139-161: Rename the misleading test case name inside
TestClassifyUpdate so it reflects the behavior being tested: change the struct
entry with name "major" (which calls classifyUpdate("4.15.1","5.0.0")) to a
clearer label such as "major-bump-classified-as-minor" (keeping the expected
value "minor"); this makes the intent explicit while leaving the classifyUpdate
call and expected assertion unchanged.

In `@pkg/proposal/proposal.go`:
- Around line 275-283: Add a MaxAttempts configuration knob and use it instead
of the hard-coded int64(2): add a MaxAttempts int (default 2) to the Config type
that c.config references, validate/clip it to the CRD-supported range 0–20 when
the config is constructed or loaded, and replace the literal int64(2) in the
proposal spec construction (where "spec" -> "maxAttempts" is set) with
int64(c.config.MaxAttempts) so the value is configurable and bounded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: e7f06f79-f57a-4dae-a3b6-f452cd6f89ce

📥 Commits

Reviewing files that changed from the base of the PR and between 1a82825 and f5cc163.

⛔ Files ignored due to path filters (3)
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/openshift/api/features/features.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (30)
  • AGENTS.md
  • go.mod
  • install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml
  • install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml
  • install/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yaml
  • install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml
  • install/0000_00_cluster-version-operator_51_lightspeed-agents.yaml
  • install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml
  • pkg/cvo/availableupdates.go
  • pkg/cvo/cvo.go
  • pkg/cvo/status_test.go
  • pkg/featuregates/featuregates.go
  • pkg/proposal/proposal.go
  • pkg/proposal/proposal_test.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/client_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/network.go
  • pkg/readiness/node_capacity.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/operator_health.go
  • pkg/readiness/pdb_drain.go
  • test/cvo/readiness.go
✅ Files skipped from review due to trivial changes (8)
  • go.mod
  • pkg/cvo/status_test.go
  • install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml
  • install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml
  • pkg/readiness/operator_health.go
  • pkg/readiness/client_test.go
  • pkg/cvo/availableupdates.go
  • pkg/readiness/checks_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • AGENTS.md
  • pkg/readiness/check_test.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/cluster_conditions.go

Comment on lines +6 to +10
metadata:
annotations:
release.openshift.io/feature-gate: LightspeedProposals
controller-gen.kubebuilder.io/version: v0.19.0
name: proposals.agentic.openshift.io
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing cluster-profile include.release.openshift.io/* annotations.

Other manifests in install/ carry include.release.openshift.io/self-managed-high-availability: "true" (and friends). This CRD has only the feature-gate annotation, which means CVO may not render it on any profile. Please add the appropriate cluster-profile annotations so the stub CRD is actually installed when the feature gate is enabled.

As per coding guidelines, “All manifests must have appropriate cluster-profile annotations (include.release.openshift.io/self-managed-high-availability, etc.)”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml`
around lines 6 - 10, The CRD manifest metadata for
proposals.agentic.openshift.io is missing the cluster-profile annotations used
by CVO; update the metadata.annotations block in
install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml to add
the same include.release.openshift.io/* profile keys used across other install
manifests (for example
include.release.openshift.io/self-managed-high-availability: "true" and any
other profiles your repo requires) so the CRD is rendered under the appropriate
cluster profiles when the feature gate (release.openshift.io/feature-gate:
LightspeedProposals) is enabled.

Comment on lines +44 to +47
operator itself (escalation child proposals).\n\nProposal is cluster-scoped.
The operator watches for new Proposals and\ndrives them through the lifecycle
automatically. Users interact with\nproposals in the Proposed phase to approve,
deny, or escalate.\n\nExample — a remediation proposal targeting a specific
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Schema description contradicts scope: Namespaced.

Line 18 declares scope: Namespaced, but the openAPIV3Schema.description states “Proposal is cluster-scoped. The operator watches for new Proposals and drives them through the lifecycle automatically.” This will confuse consumers/agents reading the CRD. Either make the CRD cluster-scoped (to match the upstream intent and the stub comment at line 1–2) or update the description to reflect namespaced behavior — and double-check this aligns with the namespaced Create(...) call in pkg/proposal/proposal.go (Line 284).

As per coding guidelines, “check that nearby comments, kubernetes.io/description annotations, and doc strings still accurately describe the new behavior. Outdated documentation is worse than no documentation.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml`
around lines 44 - 47, The CRD's openAPIV3Schema.description claims "Proposal is
cluster-scoped" but the CRD sets scope: Namespaced; fix by either changing
scope: Namespaced to scope: Cluster in
install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml or by
updating openAPIV3Schema.description (and any kubernetes.io/description/doc
comments) to accurately state it is namespaced; then ensure this choice matches
the runtime usage in pkg/proposal/proposal.go (the Create(...) call) and update
nearby comments/annotations to be consistent with the selected scope.

Comment on lines +6 to +10
metadata:
annotations:
release.openshift.io/feature-gate: LightspeedProposals
controller-gen.kubebuilder.io/version: v0.19.0
name: agents.agentic.openshift.io
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add release profile and resource description annotations.

This new install manifest only has the feature-gate annotation. Add the appropriate include.release.openshift.io/* profile annotations and a kubernetes.io/description explaining the temporary Agent CRD.

As per coding guidelines, “All manifests must have appropriate cluster-profile annotations” and “Use kubernetes.io/description annotations to explain the resource's purpose.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml`
around lines 6 - 10, Add the required release profile annotations and a
description to the CRD metadata: inside the metadata.annotations block for the
resource named agents.agentic.openshift.io, add the
include.release.openshift.io/* profile annotations (e.g.,
include.release.openshift.io/approved: "true" and the appropriate profile keys
used across manifests such as include.release.openshift.io/optional,
include.release.openshift.io/cluster or similar per your repo conventions) and
add a kubernetes.io/description annotation that succinctly explains this is a
temporary Agent CRD used for LightspeedProposals and its intended lifecycle;
update the annotations list under metadata.annotations in the manifest where
name: agents.agentic.openshift.io appears.

Comment on lines +98 to +115
llmRef:
description: |-
llmRef references a cluster-scoped LlmProvider CR that supplies the
LLM backend for this agent. The operator resolves this reference at
reconcile time and configures the sandbox pod with the provider's
credentials and model.
properties:
name:
default: ""
description: |-
Name of the referent.
This field is effectively required, but due to backwards compatibility is
allowed to be empty. Instances of this type with an empty value here are
almost certainly wrong.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
type: string
type: object
x-kubernetes-map-type: atomic
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject invalid Agent specs at admission.

llmRef.name is effectively required but can be empty, and outputFields/rawOutputSchema are documented as mutually exclusive without enforcement. Add spec-level validation so invalid Agents fail early.

Proposed validation
             required:
             - llmRef
             - skills
             type: object
+            x-kubernetes-validations:
+            - message: llmRef.name is required
+              rule: has(self.llmRef) && self.llmRef.name != ''
+            - message: outputFields and rawOutputSchema are mutually exclusive
+              rule: '!(has(self.outputFields) && has(self.rawOutputSchema))'

Also applies to: 213-447, 518-521

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml`
around lines 98 - 115, The CRD currently allows invalid Agent specs because
llmRef.name is not required and outputFields vs rawOutputSchema mutual exclusion
is not enforced; update the Agent CRD OpenAPI schema: mark llmRef.name as
required (add "required: [\"name\"]" under llmRef properties or add it to the
containing object) and add a validation that enforces mutual exclusivity between
spec.outputFields and spec.rawOutputSchema (use oneOf/anyOf with schema
predicates or an x-kubernetes-validation rule that checks exactly one is set),
applying the same changes to the other Agent-like schema blocks referenced
(lines ~213-447, 518-521) so invalid Agents are rejected at admission.

Comment on lines +6 to +10
metadata:
annotations:
release.openshift.io/feature-gate: LightspeedProposals
controller-gen.kubebuilder.io/version: v0.19.0
name: workflows.agentic.openshift.io
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add release profile and resource description annotations.

This new install manifest only has the feature-gate annotation. Add the appropriate include.release.openshift.io/* profile annotations and a kubernetes.io/description explaining the temporary Workflow CRD.

As per coding guidelines, “All manifests must have appropriate cluster-profile annotations” and “Use kubernetes.io/description annotations to explain the resource's purpose.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yaml`
around lines 6 - 10, The manifest for the CRD named
workflows.agentic.openshift.io is missing required cluster-profile and
description annotations; add appropriate include.release.openshift.io/* profile
annotations (e.g., include.release.openshift.io/accept-never,
include.release.openshift.io/optional, or the correct profiles your repo uses)
under metadata.annotations and add a kubernetes.io/description annotation that
briefly explains this is a temporary Workflow CRD for LightspeedProposals and
its intended scope/purpose so the resource has both release-profile and
descriptive metadata.

Comment thread pkg/cvo/cvo.go Outdated
Comment on lines +1275 to +1286
dc := optr.proposalCreator.DynamicClient()
output := readiness.RunAll(ctx, dc, optr.release.Version, targetVersion)
var readinessJSON string
if data, err := json.Marshal(output); err != nil {
klog.Warningf("Failed to marshal readiness check output: %v", err)
} else {
readinessJSON = string(data)
klog.V(2).Infof("Readiness check completed (%d/%d ok, %.1fs): %s",
output.Meta.ChecksOK, output.Meta.TotalChecks, output.Meta.ElapsedSeconds, readinessJSON)
}

optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use currentVersion() for proposal metadata.

optr.release.Version can miss fallback metadata that currentVersion() already merges from cached available updates. Use the merged value consistently for readiness output and proposal labels/request content.

Proposed fix
 	dc := optr.proposalCreator.DynamicClient()
-	output := readiness.RunAll(ctx, dc, optr.release.Version, targetVersion)
+	currentVersion := optr.currentVersion().Version
+	output := readiness.RunAll(ctx, dc, currentVersion, targetVersion)
 	var readinessJSON string
 	if data, err := json.Marshal(output); err != nil {
 		klog.Warningf("Failed to marshal readiness check output: %v", err)
 	} else {
@@
-	optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON)
+	optr.proposalCreator.MaybeCreateProposal(ctx, currentVersion, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dc := optr.proposalCreator.DynamicClient()
output := readiness.RunAll(ctx, dc, optr.release.Version, targetVersion)
var readinessJSON string
if data, err := json.Marshal(output); err != nil {
klog.Warningf("Failed to marshal readiness check output: %v", err)
} else {
readinessJSON = string(data)
klog.V(2).Infof("Readiness check completed (%d/%d ok, %.1fs): %s",
output.Meta.ChecksOK, output.Meta.TotalChecks, output.Meta.ElapsedSeconds, readinessJSON)
}
optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON)
dc := optr.proposalCreator.DynamicClient()
currentVersion := optr.currentVersion().Version
output := readiness.RunAll(ctx, dc, currentVersion, targetVersion)
var readinessJSON string
if data, err := json.Marshal(output); err != nil {
klog.Warningf("Failed to marshal readiness check output: %v", err)
} else {
readinessJSON = string(data)
klog.V(2).Infof("Readiness check completed (%d/%d ok, %.1fs): %s",
output.Meta.ChecksOK, output.Meta.TotalChecks, output.Meta.ElapsedSeconds, readinessJSON)
}
optr.proposalCreator.MaybeCreateProposal(ctx, currentVersion, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cvo/cvo.go` around lines 1275 - 1286, The code uses optr.release.Version
for readiness output and proposal creation which can miss merged fallback
metadata; call and use the merged currentVersion() result instead: obtain v :=
optr.currentVersion() (or however currentVersion() is exposed), pass v into
readiness.RunAll(ctx, dc, v, targetVersion) and into
optr.proposalCreator.MaybeCreateProposal(ctx, v, targetVersion, targetKind,
config.Spec.Channel, au.Updates, readinessJSON), and remove references to
optr.release.Version so readinessJSON and proposal metadata consistently use the
merged version.

Comment thread pkg/readiness/network.go
Comment on lines +33 to +43
// Check proxy
proxy, err := GetResource(ctx, dc, GVRProxy, "cluster")
if err != nil {
SectionError(&sectionErrors, "proxy", err)
} else {
result["proxy"] = map[string]any{
"http_proxy": NestedString(proxy.Object, "spec", "httpProxy"),
"https_proxy": NestedString(proxy.Object, "spec", "httpsProxy"),
"no_proxy": NestedString(proxy.Object, "spec", "noProxy"),
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not embed raw proxy configuration in proposal data.

httpProxy / httpsProxy can include credentials, and noProxy can expose internal domains. Because readiness JSON is embedded into Proposal.spec.request, emit booleans or redacted summaries instead of raw values.

Proposed redaction
 	} else {
+		httpProxy := NestedString(proxy.Object, "spec", "httpProxy")
+		httpsProxy := NestedString(proxy.Object, "spec", "httpsProxy")
+		noProxy := NestedString(proxy.Object, "spec", "noProxy")
 		result["proxy"] = map[string]any{
-			"http_proxy":  NestedString(proxy.Object, "spec", "httpProxy"),
-			"https_proxy": NestedString(proxy.Object, "spec", "httpsProxy"),
-			"no_proxy":    NestedString(proxy.Object, "spec", "noProxy"),
+			"http_proxy_configured":  httpProxy != "",
+			"https_proxy_configured": httpsProxy != "",
+			"no_proxy_configured":    noProxy != "",
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Check proxy
proxy, err := GetResource(ctx, dc, GVRProxy, "cluster")
if err != nil {
SectionError(&sectionErrors, "proxy", err)
} else {
result["proxy"] = map[string]any{
"http_proxy": NestedString(proxy.Object, "spec", "httpProxy"),
"https_proxy": NestedString(proxy.Object, "spec", "httpsProxy"),
"no_proxy": NestedString(proxy.Object, "spec", "noProxy"),
}
}
// Check proxy
proxy, err := GetResource(ctx, dc, GVRProxy, "cluster")
if err != nil {
SectionError(&sectionErrors, "proxy", err)
} else {
httpProxy := NestedString(proxy.Object, "spec", "httpProxy")
httpsProxy := NestedString(proxy.Object, "spec", "httpsProxy")
noProxy := NestedString(proxy.Object, "spec", "noProxy")
result["proxy"] = map[string]any{
"http_proxy_configured": httpProxy != "",
"https_proxy_configured": httpsProxy != "",
"no_proxy_configured": noProxy != "",
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/network.go` around lines 33 - 43, The readiness checker
currently embeds raw proxy config into result by reading GetResource(ctx, dc,
GVRProxy, "cluster") and putting NestedString(proxy.Object, "spec",
"httpProxy"/"httpsProxy"/"noProxy") into result; change this to avoid exposing
credentials/domains by replacing the raw strings with redacted summaries or
booleans (e.g., "http_proxy": {"present": true, "redacted": true, "length": N}
or simply {"present": true}) and/or sanitized metadata (e.g., host-only or
masked value), update the code that sets result["proxy"] (and any callers
expecting these keys) accordingly, and keep error handling via SectionError
unchanged; use the proxy variable, GVRProxy, GetResource and NestedString
locations to find the block to modify.

Comment thread test/cvo/readiness.go
"github.com/openshift/cluster-version-operator/test/util"
)

var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, func() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect existing Ginkgo labels in test suites.
rg -n -C2 'g\.Describe\(|g\.Context\(|g\.It\(|g\.Label\(' test --type go

Repository: openshift/cluster-version-operator

Length of output: 6506


🏁 Script executed:

# Check the full readiness.go file structure for all Ginkgo blocks and external endpoints
cat -n test/cvo/readiness.go

Repository: openshift/cluster-version-operator

Length of output: 9647


Add Ginkgo labels for this E2E test suite.

The test follows the correct Jira-formatted name but lacks required Ginkgo labels for CI selection and filtering. Add the appropriate category label(s) to the g.Describe block to comply with E2E conventions. The pattern used by other tests in this directory (e.g., g.Label("Serial"), g.Label("OTA-1813")) shows how labels organize test categorization.

Example structure
-var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, func() {
+var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, g.Label("category-label"), func() {

Replace category-label with the appropriate category for this test (e.g., Serial, a bug reference, or feature category).

As per coding guidelines, "Use Ginkgo Labels to mark test categories (e.g., TechPreview, serial)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cvo/readiness.go` at line 20, The g.Describe block for the
cluster-version-operator readiness E2E test is missing Ginkgo labels; update the
g.Describe call (the Describe defined as var _ = g.Describe(`[Jira:"Cluster
Version Operator"] cluster-version-operator readiness checks`, func() { ... }))
to include g.Label calls for CI categorization (for example add
g.Label("Serial") and any relevant ticket/feature label like g.Label("OTA-1813")
or a TechPreview label) immediately inside the Describe block so the suite is
selectable/filterable by CI tooling.

Integrate CVO with the Lightspeed agent framework by creating Proposal
CRs when available updates are discovered. This includes OLM operator
lifecycle readiness checks, product lifecycle (PLCC) data via a shared
agentic-skills image, and proposal creation gated behind the
LightspeedProposals feature gate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@harche harche force-pushed the lightspeed-proposals branch from f5cc163 to 07f4de9 Compare April 22, 2026 17:15
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
pkg/readiness/client.go (2)

153-163: ⚠️ Potential issue | 🟠 Major

Return parse errors instead of treating them as equality.

CompareVersions still returns 0 when either version fails to parse, so callers cannot distinguish invalid input from equal versions. That can silently skew readiness decisions that depend on version ordering. This was already flagged and appears unresolved.

🛠️ Proposed local fix
-// CompareVersions compares two semver strings. Returns -1, 0, or 1.
-func CompareVersions(a, b string) int {
+// CompareVersions compares two semver strings. Returns -1, 0, or 1.
+func CompareVersions(a, b string) (int, error) {
 	va, err := semver.ParseTolerant(a)
 	if err != nil {
-		return 0
+		return 0, err
 	}
 	vb, err := semver.ParseTolerant(b)
 	if err != nil {
-		return 0
+		return 0, err
 	}
-	return va.Compare(vb)
+	return va.Compare(vb), nil
 }

Verify and update all callers to handle the new error path:

#!/bin/bash
# Description: Find CompareVersions callers and surrounding branch logic.
# Expectation: every caller explicitly handles the returned error before using the comparison result.
rg -n -C3 --type=go '\bCompareVersions\s*\('

As per coding guidelines, “When modifying code, check that nearby comments, kubernetes.io/description annotations, and doc strings still accurately describe the new behavior.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/client.go` around lines 153 - 163, CompareVersions currently
masks parse errors by returning 0; change its signature from CompareVersions(a,
b string) int to CompareVersions(a, b string) (int, error) and return a non-nil
error when semver.ParseTolerant(a) or semver.ParseTolerant(b) fails instead of
returning 0; on success return va.Compare(vb) with nil error. Update the
function comment to document the new (int, error) return and then find and
update all callers of CompareVersions to handle the error path (e.g., via the
suggested rg search) before using the comparison result, and ensure any
kubernetes.io/description comments or docstrings near CompareVersions still
accurately describe the new behavior.

3-5: ⚠️ Potential issue | 🟡 Minor

Sort selector parts for deterministic output.

FormatLabelSelector still depends on Go map iteration order, so the same label map can produce different selector strings. This was already flagged and appears unresolved.

♻️ Proposed fix
 import (
 	"context"
+	"sort"
 	"strings"
 func FormatLabelSelector(labels map[string]string) string {
 	parts := make([]string, 0, len(labels))
 	for k, v := range labels {
 		parts = append(parts, k+"="+v)
 	}
+	sort.Strings(parts)
 	return strings.Join(parts, ",")
 }

As per coding guidelines, “When sorting or deduplicating collections, ensure stable ordering by including a tiebreaker field (e.g., sort by version then by name).”

Also applies to: 166-172

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/client.go` around lines 3 - 5, FormatLabelSelector currently
builds the selector from a map and relies on Go's map iteration order; change it
to collect each "key=value" (or "key in (v1,v2)" parts) into a slice, sort that
slice deterministically using a stable comparator that first sorts by key and
then by value (as a tiebreaker), and then join the sorted parts to form the
selector string. Apply the same approach to the other selector-construction
logic around lines 166-172 (the other function that assembles selector parts) so
both functions produce stable, deterministic output.
🧹 Nitpick comments (4)
pkg/readiness/client_test.go (1)

109-126: Use strings.Contains; the "empty" case asserts nothing.

The inner triple loop re-implements strings.Contains. Also, the "empty" sub-case has contains: []string{}, so the inner loop body never executes — it's effectively a no-op (a dedicated assertion like got == "" would make the intent real).

Suggested cleanup
-	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
-			got := FormatLabelSelector(tt.labels)
-			for _, s := range tt.contains {
-				found := false
-				for i := 0; i <= len(got)-len(s); i++ {
-					if got[i:i+len(s)] == s {
-						found = true
-						break
-					}
-				}
-				if !found {
-					t.Errorf("FormatLabelSelector(%v) = %q, want to contain %q", tt.labels, got, s)
-				}
-			}
-		})
-	}
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got := FormatLabelSelector(tt.labels)
+			if len(tt.contains) == 0 && got != "" {
+				t.Errorf("FormatLabelSelector(%v) = %q, want empty", tt.labels, got)
+			}
+			for _, s := range tt.contains {
+				if !strings.Contains(got, s) {
+					t.Errorf("FormatLabelSelector(%v) = %q, want to contain %q", tt.labels, got, s)
+				}
+			}
+		})
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/client_test.go` around lines 109 - 126, Replace the manual
substring search in the test loop with strings.Contains by importing "strings"
and using if !strings.Contains(got, s) to check each expected substring for
FormatLabelSelector; additionally, make the "empty" test case assert the actual
intended output (e.g., assert got == "" or whatever the expected string is)
instead of leaving contains: []string{} which makes the inner loop a
no-op—update the tests slice entry for the "empty" case to include the explicit
expected value and assert it (refer to the FormatLabelSelector call and the
tests variable in the test function).
pkg/readiness/checks_test.go (1)

13-37: Consider sharing the GVR→list-kind map with production code.

This gvrs map duplicates knowledge that should already exist alongside GVRClusterVersion, GVRClusterOperator, etc. declarations (probably pkg/readiness/client.go). If a new GVR constant is added and used by a check but not registered here, tests for that check will surface a confusing "no kind" error rather than a missing-fixture assertion. A small exported helper (e.g., readiness.ListKinds()) used by both the production code and the test fake would prevent that drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/checks_test.go` around lines 13 - 37, The test's gvrs map in
newFakeDynamicClient duplicates the production mapping of GVR→list-kind; add an
exported helper (e.g., ListKinds() returning
map[schema.GroupVersionResource]string) in the readiness package (alongside the
GVR constants in client.go), switch newFakeDynamicClient to call
readiness.ListKinds() instead of inlining gvrs, and update production code to
use the same helper so both runtime and tests share one source of truth and
avoid drift when new GVRs are added.
pkg/readiness/check.go (1)

83-151: LGTM, minor observability gap.

The concurrent orchestration with per-check timeout, error capture, and aggregated counts is clean. One small suggestion: when a check errors out (including timeout), nothing is logged at the RunAll level — callers embedding this JSON into a Proposal will see the _error field, but on-cluster debugging via CVO logs won't show which check failed or why. A single klog.V(4).Infof (or Warningf for non-timeout errors) inside the goroutine on the error branch would help operational triage without affecting output.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/check.go` around lines 83 - 151, When a check fails inside
RunAll (the goroutine that calls ch.Run with perCheckTimeout), add a klog emit
in the error branch so failures are visible in cluster logs: detect timeout vs
other errors (the err returned from ch.Run) and call klog.V(4).Infof for
timeout-related errors and klog.Warningf for non-timeout errors, include the
check name (ch.Name()), the error string, elapsed seconds, and any short snippet
of result data; keep this logging inside the same goroutine before populating
results[ch.Name()] so the existing CheckResult building (Status, Error, Data)
remains unchanged.
pkg/readiness/api_deprecations.go (1)

20-31: Brittle string-match for "resource not available" detection.

Using strings.Contains(err.Error(), "not found") to classify a non-fatal absence is fragile — error message wording can change across client-go versions, and "not found" also legitimately appears in unrelated errors (e.g., a missing namespace). Prefer typed checks against meta.IsNoMatchError(err) and apierrors.IsNotFound(err), which is exactly the pattern pkg/proposal/proposal.go (isNoMatchError) already uses in this PR.

🛡️ Proposed fix
 import (
 	"context"
 	"fmt"
-	"strings"
 
+	apierrors "k8s.io/apimachinery/pkg/api/errors"
+	"k8s.io/apimachinery/pkg/api/meta"
 	"k8s.io/client-go/dynamic"
 )
@@
-		if strings.Contains(err.Error(), "not found") {
+		if meta.IsNoMatchError(err) || apierrors.IsNotFound(err) {
 			result["warning"] = "APIRequestCount resource not available"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/api_deprecations.go` around lines 20 - 31, Replace the brittle
string match on err.Error() in the APIRequestCount listing with proper typed
checks: when ListResources(ctx, dc, GVRAPIRequestCount, "") returns an error,
test it using meta.IsNoMatchError(err) and
k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) (or the repo helper
isNoMatchError if available) instead of strings.Contains; if either indicates
the resource/GVR is absent, set the same warning/blocker fields and return
result, otherwise propagate the error (the affected symbols are ListResources,
GVRAPIRequestCount and the error handling branch around that call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/cvo/cvo.go`:
- Around line 1247-1252: When creating proposals for conditional updates, pass
combined context containing both recommended and conditional versions instead of
only au.Updates: inside the loop that calls
optr.proposalCreator.MaybeCreateProposal for each entry of
au.ConditionalUpdates, build a deduplicated slice that merges au.Updates and the
versions from au.ConditionalUpdates (or alternatively extend the buildRequest
signature to accept conditional updates) and pass that merged slice as the
"other updates" argument so buildRequest can list both recommended and
conditional alternatives when generating the proposal.

In `@pkg/proposal/proposal.go`:
- Around line 171-185: In Creator.readSystemPrompt, currently when the ConfigMap
exists but the "prompt" key is missing or not a string the code silently returns
"", so change the function to log that misconfiguration before returning: detect
the missing-or-wrong-type branch after unstructured.NestedMap(obj.Object,
"data") and emit a klog.V(2).Infof or klog.Warningf including
c.config.Namespace, c.config.PromptConfigMap and a clear message that the
"prompt" key is missing or not a string (reference symbols: readSystemPrompt,
Creator, unstructured.NestedMap); keep behavior of returning "" but ensure the
log is written to make the issue discoverable.

In `@pkg/readiness/cluster_conditions.go`:
- Around line 34-42: The code treats a missing Upgradeable condition
(conditions[ConditionUpgradeable]) as a zero-value and reports non-upgradeable;
change the logic that builds result["upgradeable"] (and any use of
summary.upgradeable) to first check whether the ConditionUpgradeable key exists
in the conditions map, and if absent default to upgradeable=true (or status
"True"/"Unknown" per your conventions), only setting upgradeable=false when the
condition is present and its Status == ConditionFalse; similarly, for
ConditionProgressing handle missing keys explicitly (keep default false if you
prefer) instead of relying on zero-value semantics so
result["update_in_progress"] accurately reflects an explicit Progressing=False
vs missing condition.

---

Duplicate comments:
In `@pkg/readiness/client.go`:
- Around line 153-163: CompareVersions currently masks parse errors by returning
0; change its signature from CompareVersions(a, b string) int to
CompareVersions(a, b string) (int, error) and return a non-nil error when
semver.ParseTolerant(a) or semver.ParseTolerant(b) fails instead of returning 0;
on success return va.Compare(vb) with nil error. Update the function comment to
document the new (int, error) return and then find and update all callers of
CompareVersions to handle the error path (e.g., via the suggested rg search)
before using the comparison result, and ensure any kubernetes.io/description
comments or docstrings near CompareVersions still accurately describe the new
behavior.
- Around line 3-5: FormatLabelSelector currently builds the selector from a map
and relies on Go's map iteration order; change it to collect each "key=value"
(or "key in (v1,v2)" parts) into a slice, sort that slice deterministically
using a stable comparator that first sorts by key and then by value (as a
tiebreaker), and then join the sorted parts to form the selector string. Apply
the same approach to the other selector-construction logic around lines 166-172
(the other function that assembles selector parts) so both functions produce
stable, deterministic output.

---

Nitpick comments:
In `@pkg/readiness/api_deprecations.go`:
- Around line 20-31: Replace the brittle string match on err.Error() in the
APIRequestCount listing with proper typed checks: when ListResources(ctx, dc,
GVRAPIRequestCount, "") returns an error, test it using meta.IsNoMatchError(err)
and k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) (or the repo helper
isNoMatchError if available) instead of strings.Contains; if either indicates
the resource/GVR is absent, set the same warning/blocker fields and return
result, otherwise propagate the error (the affected symbols are ListResources,
GVRAPIRequestCount and the error handling branch around that call).

In `@pkg/readiness/check.go`:
- Around line 83-151: When a check fails inside RunAll (the goroutine that calls
ch.Run with perCheckTimeout), add a klog emit in the error branch so failures
are visible in cluster logs: detect timeout vs other errors (the err returned
from ch.Run) and call klog.V(4).Infof for timeout-related errors and
klog.Warningf for non-timeout errors, include the check name (ch.Name()), the
error string, elapsed seconds, and any short snippet of result data; keep this
logging inside the same goroutine before populating results[ch.Name()] so the
existing CheckResult building (Status, Error, Data) remains unchanged.

In `@pkg/readiness/checks_test.go`:
- Around line 13-37: The test's gvrs map in newFakeDynamicClient duplicates the
production mapping of GVR→list-kind; add an exported helper (e.g., ListKinds()
returning map[schema.GroupVersionResource]string) in the readiness package
(alongside the GVR constants in client.go), switch newFakeDynamicClient to call
readiness.ListKinds() instead of inlining gvrs, and update production code to
use the same helper so both runtime and tests share one source of truth and
avoid drift when new GVRs are added.

In `@pkg/readiness/client_test.go`:
- Around line 109-126: Replace the manual substring search in the test loop with
strings.Contains by importing "strings" and using if !strings.Contains(got, s)
to check each expected substring for FormatLabelSelector; additionally, make the
"empty" test case assert the actual intended output (e.g., assert got == "" or
whatever the expected string is) instead of leaving contains: []string{} which
makes the inner loop a no-op—update the tests slice entry for the "empty" case
to include the explicit expected value and assert it (refer to the
FormatLabelSelector call and the tests variable in the test function).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 9e893ac2-6d45-4560-b5a8-b5d8d84bdbb4

📥 Commits

Reviewing files that changed from the base of the PR and between f5cc163 and 07f4de9.

⛔ Files ignored due to path filters (3)
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/openshift/api/features/features.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (30)
  • AGENTS.md
  • go.mod
  • install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml
  • install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml
  • install/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yaml
  • install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml
  • install/0000_00_cluster-version-operator_51_lightspeed-agents.yaml
  • install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml
  • pkg/cvo/availableupdates.go
  • pkg/cvo/cvo.go
  • pkg/cvo/status_test.go
  • pkg/featuregates/featuregates.go
  • pkg/proposal/proposal.go
  • pkg/proposal/proposal_test.go
  • pkg/readiness/api_deprecations.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/client_test.go
  • pkg/readiness/cluster_conditions.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/etcd_health.go
  • pkg/readiness/network.go
  • pkg/readiness/node_capacity.go
  • pkg/readiness/olm_lifecycle.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/readiness/operator_health.go
  • pkg/readiness/pdb_drain.go
  • test/cvo/readiness.go
✅ Files skipped from review due to trivial changes (8)
  • go.mod
  • pkg/cvo/availableupdates.go
  • install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml
  • pkg/cvo/status_test.go
  • install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml
  • pkg/readiness/network.go
  • pkg/readiness/olm_lifecycle_test.go
  • pkg/proposal/proposal_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • AGENTS.md
  • test/cvo/readiness.go
  • pkg/readiness/pdb_drain.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/olm_lifecycle.go

Comment thread pkg/cvo/cvo.go
Comment on lines +1247 to +1252
for _, u := range au.Updates {
optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, u.Version, "recommended", config.Spec.Channel, au.Updates, readinessJSON)
}
for _, u := range au.ConditionalUpdates {
optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, u.Release.Version, "conditional", config.Spec.Channel, au.Updates, readinessJSON)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect the MaybeCreateProposal signature and how the updates argument is used.
ast-grep --pattern $'func ($_ *Creator) MaybeCreateProposal($$$) {
  $$$
}'
rg -nP --type=go -C5 'MaybeCreateProposal\s*\(' pkg/proposal
rg -nP --type=go -C3 '\bbuildRequest\b|\bclassifyUpdate\b' pkg/proposal

Repository: openshift/cluster-version-operator

Length of output: 9919


🏁 Script executed:

cat -n pkg/proposal/proposal.go | sed -n '187,250p'

Repository: openshift/cluster-version-operator

Length of output: 2565


🏁 Script executed:

# Check the context around lines 1247-1252 and how au is populated
rg -n 'au\.Updates|au\.ConditionalUpdates' pkg/cvo/cvo.go -B 20 -A 5 | head -120

Repository: openshift/cluster-version-operator

Length of output: 2424


🏁 Script executed:

# Find the type definition for au (returned by getAvailableUpdates)
rg -n 'func.*getAvailableUpdates|type.*Updates' pkg/cvo/cvo.go | head -30
# Also look for where Updates and ConditionalUpdates are used together
rg -n 'ConditionalUpdates' pkg/cvo/ -B 3 -A 3

Repository: openshift/cluster-version-operator

Length of output: 32273


🏁 Script executed:

rg -n 'func SelectTarget' pkg/ -A 15

Repository: openshift/cluster-version-operator

Length of output: 1185


🏁 Script executed:

rg -n 'maybeCreateLightspeedProposals|Test.*Lightspeed' pkg/cvo/ -A 20 | head -80

Repository: openshift/cluster-version-operator

Length of output: 2858


🏁 Script executed:

# Search for tests that specifically test proposal creation with conditional updates
rg -n 'TestMaybeCreateLightspeed|TestMaybeCreate.*Proposal|ConditionalUpdates.*Proposal|Proposal.*Conditional' pkg/cvo/ -B 2 -A 10

Repository: openshift/cluster-version-operator

Length of output: 60


🏁 Script executed:

# Check the proposal package tests more broadly
rg -n 'func Test' pkg/proposal/proposal_test.go | head -20
# Also check for any comments explaining the design
rg -n 'conditional.*recommended|recommended.*conditional' pkg/proposal/ -B 2 -A 2

Repository: openshift/cluster-version-operator

Length of output: 2400


🏁 Script executed:

# Check if there are any comments or design decisions about how updates are passed to proposals
rg -n 'Other.*recommended|context|graph' pkg/proposal/proposal.go -B 2 -A 2
# Also verify the structure of ConditionalUpdate to understand the difference from Release
rg -n 'type ConditionalUpdate|type Release' --type=go | grep -E 'configv1|proposal'

Repository: openshift/cluster-version-operator

Length of output: 1093


🏁 Script executed:

# Let me check if there's any discussion in comments or in how evaluateConditionalUpdates works
# to understand if conditional updates should be part of the context
rg -n 'evaluateConditionalUpdates' pkg/cvo/availableupdates.go -B 5 -A 20 | head -50

Repository: openshift/cluster-version-operator

Length of output: 2365


🏁 Script executed:

# Check if there are any other calls to MaybeCreateProposal or related patterns in the codebase
rg -n 'MaybeCreateProposal|buildRequest' pkg/ --type=go | grep -v test | grep -v '//'

Repository: openshift/cluster-version-operator

Length of output: 832


Conditional update proposals should include context about other available updates, both recommended and conditional.

Line 1251 passes au.Updates (recommended only) when creating proposals for conditional targets. The buildRequest function (lines 210–231) uses this parameter to list "Other recommended versions available" in the proposal context. For conditional targets, this means the AI evaluates the conditional update in isolation from other conditional alternatives—it only sees recommended versions as context.

Consider passing a combined/deduplicated slice containing both au.Updates and the versions from au.ConditionalUpdates for the conditional loop, or extend buildRequest to accept conditional updates separately and include them in the proposal request.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cvo/cvo.go` around lines 1247 - 1252, When creating proposals for
conditional updates, pass combined context containing both recommended and
conditional versions instead of only au.Updates: inside the loop that calls
optr.proposalCreator.MaybeCreateProposal for each entry of
au.ConditionalUpdates, build a deduplicated slice that merges au.Updates and the
versions from au.ConditionalUpdates (or alternatively extend the buildRequest
signature to accept conditional updates) and pass that merged slice as the
"other updates" argument so buildRequest can list both recommended and
conditional alternatives when generating the proposal.

Comment thread pkg/proposal/proposal.go
Comment on lines +171 to +185
func (c *Creator) readSystemPrompt(ctx context.Context) string {
if c.config.PromptConfigMap == "" {
return ""
}
obj, err := c.client.Resource(configMapGVR).Namespace(c.config.Namespace).Get(ctx, c.config.PromptConfigMap, metav1.GetOptions{})
if err != nil {
klog.V(4).Infof("Could not read system prompt ConfigMap %s/%s: %v", c.config.Namespace, c.config.PromptConfigMap, err)
return ""
}
data, _, _ := unstructured.NestedMap(obj.Object, "data")
if prompt, ok := data["prompt"].(string); ok {
return prompt
}
return ""
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent empty prompt obscures misconfiguration.

When the ConfigMap exists but is missing the prompt key (or the key isn't a string), readSystemPrompt silently returns "" and the final proposal is created without any system-prompt context — a config bug that's invisible until an operator reads the produced Proposal.spec.request by hand. Log at klog.V(2) or klog.Warningf for this branch to make the misconfiguration discoverable.

🛡️ Proposed fix
-	data, _, _ := unstructured.NestedMap(obj.Object, "data")
-	if prompt, ok := data["prompt"].(string); ok {
-		return prompt
-	}
-	return ""
+	data, _, _ := unstructured.NestedMap(obj.Object, "data")
+	if prompt, ok := data["prompt"].(string); ok && prompt != "" {
+		return prompt
+	}
+	klog.Warningf("System prompt ConfigMap %s/%s missing non-empty 'prompt' key; proposals will be created without a system prompt",
+		c.config.Namespace, c.config.PromptConfigMap)
+	return ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/proposal/proposal.go` around lines 171 - 185, In
Creator.readSystemPrompt, currently when the ConfigMap exists but the "prompt"
key is missing or not a string the code silently returns "", so change the
function to log that misconfiguration before returning: detect the
missing-or-wrong-type branch after unstructured.NestedMap(obj.Object, "data")
and emit a klog.V(2).Infof or klog.Warningf including c.config.Namespace,
c.config.PromptConfigMap and a clear message that the "prompt" key is missing or
not a string (reference symbols: readSystemPrompt, Creator,
unstructured.NestedMap); keep behavior of returning "" but ensure the log is
written to make the issue discoverable.

Comment on lines +34 to +42
upgradeable := conditions[ConditionUpgradeable]
result["upgradeable"] = map[string]any{
"status": upgradeable.Status,
"reason": upgradeable.Reason,
"message": upgradeable.Message,
}

progressing := conditions[ConditionProgressing]
result["update_in_progress"] = progressing.Status == ConditionTrue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Absent Upgradeable condition is reported as not upgradeable.

conditions[ConditionUpgradeable] yields a zero-value struct when the condition is missing, so result["upgradeable"].status becomes "" and summary.upgradeable becomes false. Per OpenShift convention, the absence of Upgradeable means the cluster is upgradeable (only Upgradeable=False blocks). With the current code, a healthy cluster that simply hasn't set the condition will be reported to the agent as non-upgradeable, which flips the downstream decision.

Consider treating missing/absent as "upgradeable = true" (or at least "unknown") and only setting false when the condition is explicitly False. Same consideration applies if you care about distinguishing "no Progressing condition" from "Progressing=False", though defaulting that one to false is reasonable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/readiness/cluster_conditions.go` around lines 34 - 42, The code treats a
missing Upgradeable condition (conditions[ConditionUpgradeable]) as a zero-value
and reports non-upgradeable; change the logic that builds result["upgradeable"]
(and any use of summary.upgradeable) to first check whether the
ConditionUpgradeable key exists in the conditions map, and if absent default to
upgradeable=true (or status "True"/"Unknown" per your conventions), only setting
upgradeable=false when the condition is present and its Status ==
ConditionFalse; similarly, for ConditionProgressing handle missing keys
explicitly (keep default false if you prefer) instead of relying on zero-value
semantics so result["update_in_progress"] accurately reflects an explicit
Progressing=False vs missing condition.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 23, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants